Kolaczkowski Bryan, Thornton Joseph W
Center for Ecology and Evolutionary Biology, University of Oregon, USA.
Mol Biol Evol. 2008 Jun;25(6):1054-66. doi: 10.1093/molbev/msn042. Epub 2008 Mar 3.
Evolutionary relationships are typically inferred from molecular sequence data using a statistical model of the evolutionary process. When the model accurately reflects the underlying process, probabilistic phylogenetic methods recover the correct relationships with high accuracy. There is ample evidence, however, that models commonly used today do not adequately reflect real-world evolutionary dynamics. Virtually all contemporary models assume that relatively fast-evolving sites are fast across the entire tree, whereas slower sites always evolve at relatively slower rates. Many molecular sequences, however, exhibit site-specific changes in evolutionary rates, called "heterotachy." Here we examine the accuracy of 2 phylogenetic methods for incorporating heterotachy, the mixed branch length model--which incorporates site-specific rate changes by summing likelihoods over multiple sets of branch lengths on the same tree--and the covarion model, which uses a hidden Markov process to allow sites to switch between variable and invariable as they evolve. Under a variety of simple heterogeneous simulation conditions, the mixed model was dramatically more accurate than homotachous models, which were subject to topological biases as well as biases in branch length estimates. When data were simulated with strong versions of the types of heterotachy observed in real molecular sequences, the mixed branch length model was more accurate than homotachous techniques. Analyses of empirical data sets confirmed that the mixed branch length model can improve phylogenetic accuracy under conditions that cause homotachous models to fail. In contrast, the covarion model did not improve phylogenetic accuracy compared with homotachous models and was sometimes substantially less accurate. We conclude that a mixed branch length approach, although not the solution to all phylogenetic errors, is a valuable strategy for improving the accuracy of inferred trees.
进化关系通常是根据分子序列数据,利用进化过程的统计模型推断出来的。当模型准确反映潜在过程时,概率系统发育方法就能高精度地恢复正确的关系。然而,有充分证据表明,当今常用的模型并不能充分反映现实世界的进化动态。几乎所有当代模型都假定,进化较快的位点在整棵树上进化速度都快,而进化较慢的位点总是以相对较慢的速度进化。然而,许多分子序列在进化速率上表现出位点特异性变化,即“异速进化”。在这里,我们检验了两种纳入异速进化的系统发育方法的准确性,即混合分支长度模型(通过对同一棵树上的多组分支长度的似然性求和来纳入位点特异性速率变化)和协变模型(使用隐马尔可夫过程使位点在进化过程中在可变和不变状态之间切换)。在各种简单的异质性模拟条件下,混合模型比同速进化模型的准确性要高得多,同速进化模型存在拓扑偏差以及分支长度估计偏差。当用在真实分子序列中观察到的强版本异速进化类型模拟数据时,混合分支长度模型比同速进化技术更准确。对经验数据集的分析证实,在导致同速进化模型失效的条件下,混合分支长度模型可以提高系统发育的准确性。相比之下,协变模型与同速进化模型相比并没有提高系统发育的准确性,有时甚至准确性大幅降低。我们得出结论,混合分支长度方法虽然不能解决所有系统发育错误,但却是提高推断树准确性的一种有价值的策略。