Desper Richard, Gascuel Olivier
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892, USA.
J Comput Biol. 2002;9(5):687-705. doi: 10.1089/106652702761034136.
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n(3)) time, where n is the number of taxa, while the current implementations of the second are in O(p n(3)) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n(2)) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n(2) + p n), i.e., O(n(2)) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard "unweighted" OLS, where all taxa have the same weight so that the weight of a subtree is equal to the number of its taxa. The balanced minimum evolution scheme (BME) runs slower than the OLS version, requiring O(n(2) x diam(T)) operations to build the starting tree and O(p n x diam(T)) to perform the NNIs, where diam(T) is the topological diameter of the output tree. In the usual Yule-Harding distribution on phylogenetic trees, the diameter expectation is in log(n), so our algorithms are in practice faster that NJ. Moreover, this BME scheme yields a very significant improvement over NJ and other distance-based algorithms, especially with large trees, in terms of topological accuracy.
系统发育估计的最小进化(ME)方法已被证明,当它与将度量拟合到树结构的普通最小二乘法(OLS)结合使用时,在统计上是一致的。使用ME的传统方法是从给定矩阵的邻接归并(NJ)拓扑开始,然后从该起点进行拓扑搜索。第一阶段需要O(n(3))时间,其中n是分类单元的数量,而第二阶段的当前实现是O(p n(3))或更多,其中p是程序执行的交换次数。在本文中,我们研究了一种最小进化的贪心方法,该方法在O(n(2))时间内生成一个起始拓扑。此外,我们提供了一种使用最近邻交换(NNI)搜索最佳拓扑的算法,其中执行p次NNI的成本是O(n(2) + p n),即在实际中是O(n(2)),因为p总是远小于n。贪心最小进化(GME)算法与NNI结合使用时,生成的树在拓扑准确性方面与NJ树相当接近。我们还研究了在平衡加权方案下的ME,其中兄弟子树具有相等的权重,这与标准的“未加权”OLS相反,在OLS中所有分类单元具有相同的权重,因此子树的权重等于其分类单元的数量。平衡最小进化方案(BME)的运行速度比OLS版本慢,构建起始树需要O(n(2) x diam(T))操作,执行NNI需要O(p n x diam(T))操作,其中diam(T)是输出树的拓扑直径。在系统发育树通常的尤尔 - 哈丁分布中,直径期望为log(n),所以我们的算法在实际中比NJ更快。此外,就拓扑准确性而言,这种BME方案相对于NJ和其他基于距离的算法有非常显著的改进,尤其是对于大树。