Martin M J, González-Candelas F, Sobrino F, Dopazo J
Tecnología para Diagnóstico e Investigación (TDI) S.A., c/Condes de Torreanaz, Madrid, Spain.
J Mol Evol. 1995 Dec;41(6):1128-38. doi: 10.1007/BF00173194.
The availability of fast and accurate sequencing procedures along with the use of PCR has led to a proliferation of studies of variability at the molecular level in populations. Nevertheless, it is often impractical to examine long genomic stretches and a large number of individuals at the same time. In order to optimize this kind of study, we suggest a heuristic procedure for detection of the shortest region whose informational content can be considered sufficient for significant phylogenetic reconstruction. The method is based on the comparison of the pairwise genetic distances obtained from a set of sequences of reference to those obtained for different windows of variable size and position by means of a simple index. We also present an approach for testing whether the informative content in the stretches selected in this way is significantly different from the corresponding content shown by the larger genomic regions used as reference. Application of this test to the analysis of the VP1 protein gene of foot-and-mouth-disease type C virus allowed us to define optimal stretches whose informative content is not significantly different from that displayed by the complete VP1 sequence. We showed that the predictions made for type C sequences are valid for type O sequences, indicating that the results of the procedure are consistent.
快速准确的测序程序以及聚合酶链反应(PCR)的使用,使得针对群体分子水平变异性的研究大量涌现。然而,同时检测长基因组片段和大量个体往往不切实际。为了优化此类研究,我们提出一种启发式程序,用于检测最短区域,其信息含量可被视为足以进行显著的系统发育重建。该方法基于通过一个简单指标,比较从一组参考序列获得的成对遗传距离与从不同大小和位置的可变窗口获得的成对遗传距离。我们还提出一种方法,用于测试以这种方式选择的片段中的信息含量是否与用作参考的较大基因组区域所显示的相应含量存在显著差异。将此测试应用于C型口蹄疫病毒VP1蛋白基因的分析,使我们能够确定信息含量与完整VP1序列所显示的信息含量无显著差异的最佳片段。我们表明,对C型序列的预测对O型序列也有效,这表明该程序的结果是一致的。