Desper Richard, Gascuel Olivier
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.
Mol Biol Evol. 2004 Mar;21(3):587-98. doi: 10.1093/molbev/msh049. Epub 2003 Dec 23.
Due to its speed, the distance approach remains the best hope for building phylogenies on very large sets of taxa. Recently (R. Desper and O. Gascuel, J. Comp. Biol. 9:687-705, 2002), we introduced a new "balanced" minimum evolution (BME) principle, based on a branch length estimation scheme of Y. Pauplin (J. Mol. Evol. 51:41-47, 2000). Initial simulations suggested that FASTME, our program implementing the BME principle, was more accurate than or equivalent to all other distance methods we tested, with running time significantly faster than Neighbor-Joining (NJ). This article further explores the properties of the BME principle, and it explains and illustrates its impressive topological accuracy. We prove that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates. We show that the BME principle is statistically consistent. We demonstrate that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths. Finally, we consider a large simulated data set, with 5,000 100-taxon trees generated by the Aldous beta-splitting distribution encompassing a range of distributions from Yule-Harding to uniform, and using a covarion-like model of sequence evolution. FASTME produces trees faster than NJ, and much faster than WEIGHBOR and the weighted least-squares implementation of PAUP*. Moreover, FASTME trees are consistently more accurate at all settings, ranging from Yule-Harding to uniform distributions, and all ranges of maximum pairwise divergence and departure from molecular clock. Interestingly, the covarion parameter has little effect on the tree quality for any of the algorithms. FASTME is freely available on the web.
由于其速度快,距离法仍然是在非常大的分类单元集上构建系统发育树的最大希望。最近(R. Desper和O. Gascuel,《计算生物学杂志》9:687 - 705,2002年),我们基于Y. Pauplin(《分子进化杂志》51:41 - 47,2000年)的分支长度估计方案引入了一种新的“平衡”最小进化(BME)原则。初步模拟表明,我们实现BME原则的程序FASTME比我们测试的所有其他距离方法更准确或与之相当,运行时间比邻接法(NJ)显著更快。本文进一步探讨了BME原则的性质,并解释和说明了其令人印象深刻的拓扑准确性。我们证明BME原则是加权最小二乘法的一个特例,距离估计具有生物学意义上的方差。我们表明BME原则在统计上是一致的。我们证明FASTME只生成具有正分支长度的树,这一特征将该方法与可能生成具有生物学上无意义的负长度分支的NJ(及相关方法)区分开来。最后,我们考虑一个大型模拟数据集,其中有5000个由Aldous贝塔分裂分布生成的100分类单元树,涵盖从尤尔 - 哈丁分布到均匀分布的一系列分布,并使用类似协变子的序列进化模型。FASTME生成树的速度比NJ快,比WEIGHBOR和PAUP*的加权最小二乘实现快得多。此外,在从尤尔 - 哈丁分布到均匀分布的所有设置下,以及在最大成对差异和偏离分子钟的所有范围内,FASTME树始终更准确。有趣的是,协变子参数对任何算法的树质量影响很小。FASTME可在网上免费获取。