Department of Computer Science, ETH Zurich, Zurich, Switzerland.
BMC Bioinformatics. 2012 Jun 27;13:148. doi: 10.1186/1471-2105-13-148.
We analyze phylogenetic tree building methods from molecular sequences (PTMS). These are methods which base their construction solely on sequences, coding DNA or amino acids.
Our first result is a statistically significant evaluation of 176 PTMSs done by comparing trees derived from 193138 orthologous groups of proteins using a new measure of quality between trees. This new measure, called the Intra measure, is very consistent between different groups of species and strong in the sense that it separates the methods with high confidence. The second result is the comparison of the trees against trees derived from accepted taxonomies, the Taxon measure. We consider the NCBI taxonomic classification and their derived topologies as the most accepted biological consensus on phylogenies, which are also available in electronic form. The correlation between the two measures is remarkably high, which supports both measures simultaneously.
The big surprise of the evaluation is that the maximum likelihood methods do not score well, minimal evolution distance methods over MSA-induced alignments score consistently better. This comparison also allows us to rank different components of the tree building methods, like MSAs, substitution matrices, ML tree builders, distance methods, etc. It is also clear that there is a difference between Metazoa and the rest, which points out to evolution leaving different molecular traces. We also think that these measures of quality of trees will motivate the design of new PTMSs as it is now easier to evaluate them with certainty.
我们分析了基于分子序列(PTMS)构建系统发育树的方法。这些方法仅基于序列、编码 DNA 或氨基酸构建树。
我们的第一个结果是通过使用新的树间质量度量对来自 193138 个蛋白质直系同源物组的树进行比较,对 176 种 PTMS 进行了具有统计学意义的评估。该新度量称为 Intra 度量,在不同物种组之间非常一致,并且在能够高度置信地区分方法方面非常强大。第二个结果是将树与来自已接受分类学的树(Taxon 度量)进行比较。我们认为 NCBI 分类学分类及其衍生拓扑结构是对系统发育最被接受的生物共识,它们也以电子形式提供。这两个度量之间的相关性非常高,这支持了同时使用这两个度量。
评估的一个大惊喜是最大似然方法的得分并不高,最小进化距离方法优于基于多重序列比对(MSA)的对齐方法。这种比较还允许我们对树构建方法的不同组件进行排序,如 MSA、取代矩阵、最大似然树构建器、距离方法等。很明显,Metazoa 与其他方法之间存在差异,这表明进化留下了不同的分子痕迹。我们还认为,这些树的质量度量将激励新的 PTMS 的设计,因为现在可以更确定地评估它们。