Gardner Shea N, Lam Marisa W, Mulakken Nisha J, Torres Clinton L, Smith Jason R, Slezak Tom R
Computations, Lawrence Livermore National Laboratory, P.O. Box 808, L-174, Livermore, CA 94551, USA.
J Clin Microbiol. 2004 Dec;42(12):5472-6. doi: 10.1128/JCM.42.12.5472-5476.2004.
We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (near neighbors) that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near-neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near-neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. Severe acute respiratory syndrome and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near-neighbor sequences are urgently needed. Our results also indicate that double-stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.
我们构建了一个系统,用于指导有关开发诊断性DNA特征所需基因组测序量的决策,诊断性DNA特征是足以唯一识别病毒物种的短序列。我们使用了现有的DNA诊断特征预测流程,该流程会选择目标物种基因组中在目标菌株间保守的区域(为了可靠性,防止假阴性)以及相对于其他物种具有独特性的区域(为了特异性,避免假阳性)。我们基于现有序列数据进行了模拟,以评估预测在目标物种菌株间保守且相对于其他细菌和病毒物种具有独特性的诊断特征区域所需的目标物种以及近缘系统发育亲属(近邻)的基因组序列数量。对于天花等DNA病毒,三个目标基因组为选择全物种特征提供了足够的指导。三个近邻基因组对于物种特异性至关重要。相比之下,大多数RNA病毒需要四个目标基因组且不需要近邻基因组,因为菌株间缺乏保守性比缺乏独特性的限制更大。严重急性呼吸综合征和埃博拉病毒(扎伊尔型)是例外,因为目前额外的目标基因组并不能改善预测,但近邻序列却迫切需要。我们的结果还表明,双链DNA病毒在菌株间比RNA病毒更具保守性,因为在大多数情况下,DNA病毒至少有一个保守的特征候选序列,而RNA病毒则没有保守的特征候选序列。