Qiao Baozhen, Goldberg Tony L, Olsen Gary J, Weigel Ronald M
Division of Epidemiology, Department of Pathobiology, University of Illinois, 2001 South Lincoln Avenue, Urbana, IL 61802, USA.
Infect Genet Evol. 2006 Jul;6(4):323-30. doi: 10.1016/j.meegid.2005.10.002. Epub 2006 Jan 6.
Partial genome sequencing (PGS) and restriction fragment analysis (RFA) are used frequently in molecular epidemiologic investigations. The relative accuracy of PGS and RFA in phylogenetic reconstruction has not been assessed. In this study, 32 model phylogenetic trees with 16 extant lineages were generated, for which DNA sequences were simulated under varying conditions of genome length, nucleotide substitution rate, and between-site substitution rate variation. Genotyping using PGS and RFA was simulated. The effect of tree structure (stemminess, imbalance, lineage variation) on the accuracy of phylogenetic reconstruction (topological and branch length similarity) was evaluated. Overall, PGS was more accurate than RFA. The accuracy of PGS increased with increasing sequence length. The accuracy of RFA increased with the number of restriction enzymes used. In fragment size comparison, the Dice and Nei-Li algorithms differed little, with both more accurate than the Fragment Size Distribution algorithm. For RFA, higher tree stemminess and longer genome length were associated with higher topological accuracy, whereas lower tree stemminess and lower substitution rates were associated with higher branch length accuracy. For PGS, lower tree imbalance was associated with higher topological accuracy, whereas lower tree stemminess, higher substitution rate, and lower between-site substitution rate variation were associated with higher branch length accuracy. RFA had higher topological accuracy than PGS only for the shortest sequence length (200 bps) at a low substitution rate, high tree stemminess, and long genome length. PGS had equal or higher accuracy in branch length reconstruction than RFA under all conditions investigated. Thus, partial genome sequencing is recommended over restriction fragment analysis for conditions within the parameter space examined.
部分基因组测序(PGS)和限制性片段分析(RFA)在分子流行病学调查中经常使用。PGS和RFA在系统发育重建中的相对准确性尚未得到评估。在本研究中,生成了32个具有16个现存谱系的模型系统发育树,并在基因组长度、核苷酸替换率和位点间替换率变化的不同条件下模拟了DNA序列。模拟了使用PGS和RFA进行基因分型的过程。评估了树结构(茎状性、不平衡性、谱系变异)对系统发育重建准确性(拓扑结构和分支长度相似性)的影响。总体而言,PGS比RFA更准确。PGS的准确性随着序列长度的增加而提高。RFA的准确性随着所用限制性酶数量的增加而提高。在片段大小比较中,Dice算法和Nei-Li算法差异不大,两者都比片段大小分布算法更准确。对于RFA,较高的树茎状性和较长的基因组长度与较高的拓扑准确性相关,而较低的树茎状性和较低的替换率与较高的分支长度准确性相关。对于PGS,较低的树不平衡性与较高的拓扑准确性相关,而较低的树茎状性、较高的替换率和较低的位点间替换率变化与较高的分支长度准确性相关。仅在低替换率、高树茎状性和长基因组长度的最短序列长度(200个碱基对)下,RFA的拓扑准确性高于PGS。在所有研究条件下,PGS在分支长度重建中的准确性与RFA相等或更高。因此,在所研究的参数空间范围内的条件下,建议使用部分基因组测序而非限制性片段分析。