Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Vienna, Austria.
Mol Biol Evol. 2012 Feb;29(2):663-73. doi: 10.1093/molbev/msr220. Epub 2011 Sep 22.
Among the criteria to evaluate the performance of a phylogenetic method, robustness to model violation is of particular practical importance as complete a priori knowledge of evolutionary processes is typically unavailable. For studies of robustness in phylogenetic inference, a utility to add well-defined model violations to the simulated data would be helpful. We therefore introduce ImOSM, a tool to imbed intermittent evolution as model violation into an alignment. Intermittent evolution refers to extra substitutions occurring randomly on branches of a tree, thus changing alignment site patterns. This means that the extra substitutions are placed on the tree after the typical process of sequence evolution is completed. We then study the robustness of widely used phylogenetic methods: maximum likelihood (ML), maximum parsimony (MP), and a distance-based method (BIONJ) to various scenarios of model violation. Violation of rates across sites (RaS) heterogeneity and simultaneous violation of RaS and the transition/transversion ratio on two nonadjacent external branches hinder all the methods recovery of the true topology for a four-taxon tree. For an eight-taxon balanced tree, the violations cause each of the three methods to infer a different topology. Both ML and MP fail, whereas BIONJ, which calculates the distances based on the ML estimated parameters, reconstructs the true tree. Finally, we report that a test of model homogeneity and goodness of fit tests have enough power to detect such model violations. The outcome of the tests can help to actually gain confidence in the inferred trees. Therefore, we recommend using these tests in practical phylogenetic analyses.
在评估系统发育方法性能的标准中,对模型违反的稳健性具有特别实际的重要性,因为通常无法完全了解进化过程的先验知识。对于系统发育推断中稳健性的研究,一种将明确定义的模型违反添加到模拟数据中的实用程序将是有帮助的。因此,我们引入了 ImOSM,这是一种将间歇性进化作为模型违反嵌入到比对中的工具。间歇性进化是指在树的分支上随机发生的额外替换,从而改变比对位点模式。这意味着额外的替换是在典型的序列进化过程完成后放置在树上的。然后,我们研究了广泛使用的系统发育方法的稳健性:最大似然(ML)、最大简约(MP)和基于距离的方法(BIONJ),以应对各种模型违反情况。跨位点速率(RaS)异质性的违反以及同时违反 RaS 和两个非相邻外部分支的转换/颠换比,会阻碍所有方法恢复具有四个分类单元的树的真实拓扑结构。对于具有八个分类单元的平衡树,违反情况会导致这三种方法中的每一种都推断出不同的拓扑结构。ML 和 MP 都失败了,而 BIONJ 则根据 ML 估计的参数计算距离,从而重建了真实的树。最后,我们报告说,模型同质性检验和拟合优度检验具有足够的能力来检测这种模型违反。这些测试的结果可以帮助实际对推断出的树有信心。因此,我们建议在实际的系统发育分析中使用这些测试。