Bordewich Magnus, Mihaescu Radu
Durham University, Durham.
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):576-83. doi: 10.1109/TCBB.2013.39.
Distance-based phylogenetic methods attempt to reconstruct an accurate phylogenetic tree from an estimated matrix of pairwise distances between taxa. This paper examines two distance-based algorithms (GreedyBME and FastME) that are based on the principle of minimizing the balanced minimum evolution score of the output tree in relation to the given estimated distance matrix. This is also the principle that underlies the neighbor-joining (NJ) algorithm. We show that GreedyBME and FastME both reconstruct the entire correct tree if the input data are quartet consistent, and also that if the maximum error of any distance estimate is epsilon, then both algorithms output trees containing all sufficiently long edges of the true tree: those having length at least 3epsilon. That is to say, the algorithms have edge safety radius 1/3. In contrast, quartet consistency of the data is not sufficient to guarantee the NJ algorithm reconstructs the correct tree, and moreover, the NJ algorithm has edge safety radius of 1/4: Only edges of the true tree of length at least 4epsilon can be guaranteed to appear in the output. These results give further theoretical support to the experimental evidence suggesting FastME is a more suitable distance-based phylogeny reconstruction method than the NJ algorithm.
基于距离的系统发育方法试图从分类单元之间成对距离的估计矩阵中重建一棵准确的系统发育树。本文研究了两种基于距离的算法(GreedyBME和FastME),它们基于相对于给定估计距离矩阵最小化输出树的平衡最小进化得分的原则。这也是邻接法(NJ)算法的基础原则。我们表明,如果输入数据是四重一致的,那么GreedyBME和FastME都能重建整个正确的树,并且如果任何距离估计的最大误差为ε,那么这两种算法输出的树都包含真实树的所有足够长的边:那些长度至少为3ε的边。也就是说,这些算法的边安全半径为1/3。相比之下,数据的四重一致性不足以保证NJ算法能重建正确的树,而且NJ算法的边安全半径为1/4:只有真实树中长度至少为4ε的边才能保证出现在输出中。这些结果为实验证据提供了进一步的理论支持,表明FastME是一种比NJ算法更适合的基于距离的系统发育重建方法。