Molecular Evolution and Bioinformatics Unit, Department of Biology, National University of Ireland, Maynooth, County Kildare, Ireland.
Syst Biol. 2011 Dec;60(6):833-44. doi: 10.1093/sysbio/syr064. Epub 2011 Jul 29.
Current phylogenetic methods attempt to account for evolutionary rate variation across characters in a matrix. This is generally achieved by the use of sophisticated evolutionary models, combined with dense sampling of large numbers of characters. However, systematic biases and superimposed substitutions make this task very difficult. Model adequacy can sometimes be achieved at the cost of adding large numbers of free parameters, with each parameter being optimized according to some criterion, resulting in increased computation times and large variances in the model estimates. In this study, we develop a simple approach that estimates the relative evolutionary rate of each homologous character. The method that we describe uses the similarity between characters as a proxy for evolutionary rate. In this article, we work on the premise that if the character-state distribution of a homologous character is similar to many other characters, then this character is likely to be relatively slowly evolving. If the character-state distribution of a homologous character is not similar to many or any of the rest of the characters in a data set, then it is likely to be the result of rapid evolution. We show that in some test cases, at least, the premise can hold and the inferences are robust. Importantly, the method does not use a "starting tree" to make the inference and therefore is tree independent. We demonstrate that this approach can work as well as a maximum likelihood (ML) approach, though the ML method needs to have a known phylogeny, or at least a very good estimate of that phylogeny. We then demonstrate some uses for this method of analysis, including the improvement in phylogeny reconstruction for both deep-level and recent relationships and overcoming systematic biases such as base composition bias. Furthermore, we compare this approach to two well-established methods for reweighting or removing characters. These other methods are tree-based and we show that they can be systematically biased. We feel this method can be useful for phylogeny reconstruction, understanding evolutionary rate variation, and for understanding selection variation on different characters.
目前的系统发育方法试图在矩阵中解释字符的进化率变化。这通常通过使用复杂的进化模型并结合对大量字符的密集采样来实现。然而,系统偏差和叠加替换使得这项任务非常困难。模型的充分性有时可以通过添加大量自由参数来实现,每个参数都根据某些标准进行优化,从而导致计算时间增加和模型估计的方差增大。在这项研究中,我们开发了一种简单的方法来估计每个同源字符的相对进化率。我们描述的方法使用字符之间的相似性作为进化率的代理。在本文中,我们的工作前提是,如果同源字符的字符状态分布与许多其他字符相似,那么这个字符很可能进化缓慢。如果同源字符的字符状态分布与数据集的许多或任何其他字符都不相似,那么它很可能是快速进化的结果。我们表明,至少在某些测试案例中,前提是可以成立的,并且推断是可靠的。重要的是,该方法不使用“起始树”进行推断,因此是独立于树的。我们证明,尽管 ML 方法需要已知的系统发育,或者至少是该系统发育的非常好的估计,但这种方法可以与最大似然(ML)方法一样有效。然后,我们展示了这种分析方法的一些用途,包括对深层和近期关系的系统发育重建的改进,以及克服系统偏差,如碱基组成偏差。此外,我们将这种方法与两种成熟的字符重加权或去除方法进行了比较。这些其他方法基于树,我们表明它们可能存在系统偏差。我们认为这种方法对于系统发育重建、理解进化率变化以及理解不同字符上的选择变化都很有用。