Ogden T Heath, Rosenberg Michael S
Center for Evolutionary Functional Genomics, The Biodesign Institute, and the School of Life Sciences, Arizona State University, Tempe, Arizona 85287-4501, USA.
Syst Biol. 2006 Apr;55(2):314-28. doi: 10.1080/10635150500541730.
Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.
系统发育树通常被认为更多地依赖于序列比对的细节,而非重建方法。为了确定比对准确性在系统发育推断过程中所起的作用,我们对包含插入和缺失事件的序列进行了模拟。在不同条件下(超度量等分支长度、超度量随机分支长度、非超度量随机分支长度),针对梳状、平衡和随机树形模拟了数据集。通过比较假设比对和真实比对,能够确定两种比对准确性的度量,即整个数据集的度量和各个分支的度量。总体而言,我们的结果表明,随着比对误差增加,拓扑准确性降低。对于源自更梳状拓扑结构的数据集,这种趋势更为明显。相比之下,对于平衡的、超度量的、等分支长度的树形结构,比对不准确对树形重建的平均影响较小。这些结论基于不同条件下许多分析的平均趋势,并且任何一个特定分析,无论比对准确性如何,都可能得出非常准确或不准确的拓扑结构。一般来说,在树形重建准确性方面,最大似然法和贝叶斯法优于邻接法和最大简约法。结果还表明,随着分支及其相邻分支长度的增加,比对准确性降低,且相邻分支的长度是影响拓扑准确性的主要因素。因此,多序列比对可能是对拓扑重建产生下游影响的一个重要因素。