Gascuel O
GERAD, Ecole des HEC, Montreal, Quebec, Canada.
Mol Biol Evol. 1997 Jul;14(7):685-95. doi: 10.1093/oxfordjournals.molbev.a025808.
We propose an improved version of the neighbor-joining (NJ) algorithm of Saitou and Nei. This new algorithm, BIONJ, follows the same agglomerative scheme as NJ, which consists of iteratively picking a pair of taxa, creating a new mode which represents the cluster of these taxa, and reducing the distance matrix by replacing both taxa by this node. Moreover, BIONJ uses a simple first-order model of the variances and covariances of evolutionary distance estimates. This model is well adapted when these estimates are obtained from aligned sequences. At each step it permits the selection, from the class of admissible reductions, of the reduction which minimizes the variance of the new distance matrix. In this way, we obtain better estimates to choose the pair of taxa to be agglomerated during the next steps. Moreover, in comparison with NJ's estimates, these estimates become better and better as the algorithm proceeds. BIONJ retains the good properties of NJ--especially its low run time. Computer simulations have been performed with 12-taxon model trees to determine BIONJ's efficiency. When the substitution rates are low (maximum pairwise divergence approximately 0.1 substitutions per site) or when they are constant among lineages, BIONJ is only slightly better than NJ. When the substitution rates are higher and vary among lineages,BIONJ clearly has better topological accuracy. In the latter case, for the model trees and the conditions of evolution tested, the topological error reduction is on the average around 20%. With highly-varying-rate trees and with high substitution rates (maximum pairwise divergence approximately 1.0 substitutions per site), the error reduction may even rise above 50%, while the probability of finding the correct tree may be augmented by as much as 15%.
我们提出了一种改进版的斋藤和内的邻接法(NJ)算法。这种新算法BIONJ遵循与NJ相同的凝聚方案,该方案包括迭代地选择一对分类单元,创建一个代表这些分类单元聚类的新节点,并通过用该节点替换这两个分类单元来简化距离矩阵。此外,BIONJ使用了一个关于进化距离估计的方差和协方差的简单一阶模型。当这些估计是从比对序列中获得时,这个模型非常适用。在每一步,它允许从可允许的简化类别中选择能使新距离矩阵方差最小的简化方式。通过这种方式,我们能获得更好的估计,以便在后续步骤中选择要凝聚的分类单元对。而且,与NJ的估计相比,随着算法的进行,这些估计会越来越好。BIONJ保留了NJ的良好特性——尤其是其运行时间短。我们使用12分类单元的模型树进行了计算机模拟,以确定BIONJ的效率。当替换率较低(最大成对差异约为每位点0.1次替换)或在谱系间恒定时,BIONJ仅比NJ略好。当替换率较高且在谱系间变化时,BIONJ明显具有更好的拓扑准确性。在后一种情况下,对于测试的模型树和进化条件,拓扑误差平均降低约20%。对于替换率高度变化的树和高替换率(最大成对差异约为每位点1.0次替换),误差降低甚至可能超过50%,而找到正确树的概率可能会增加多达15%。