Department of Statistics, University of Kentucky.
Department of Biology and Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park Department of Biology, Temple University
Mol Biol Evol. 2016 Jun;33(6):1618-24. doi: 10.1093/molbev/msw042. Epub 2016 Feb 28.
At the present time it is often stated that the maximum likelihood or the Bayesian method of phylogenetic construction is more accurate than the neighbor joining (NJ) method. Our computer simulations, however, have shown that the converse is true if we use p distance in the NJ procedure and the criterion of obtaining the true tree (Pc expressed as a percentage) or the combined quantity (c) of a value of Pc and a value of Robinson-Foulds' average topological error index (dT). This c is given by Pc (1 - dT/dTmax) = Pc (m - 3 - dT/2)/(m - 3), where m is the number of taxa used and dTmax is the maximum possible value of dT, which is given by 2(m - 3). This neighbor joining method with p distance (NJp method) will be shown generally to give the best data-fit model. This c takes a value between 0 and 1, and a tree-making method giving a high value of c is considered to be good. Our computer simulations have shown that the NJp method generally gives a better performance than the other methods and therefore this method should be used in general whether the gene is compositional or it contains the mosaic DNA regions or not.
目前,人们常说,最大似然或贝叶斯系统发育构建方法比邻接法(NJ 法)更准确。然而,我们的计算机模拟表明,如果我们在 NJ 程序中使用 p 距离和获得真实树的标准(以百分比表示的 Pc)或组合量(c),即 Pc 值和罗宾逊-福尔德斯平均拓扑误差指数(dT)值的组合,则情况恰恰相反。这个 c 可以表示为 Pc(1 - dT/dTmax)= Pc(m - 3 - dT/2)/(m - 3),其中 m 是使用的分类单元数量,dTmax 是 dT 的最大可能值,其值为 2(m - 3)。这种带有 p 距离的邻接法(NJp 方法)通常被认为可以给出最佳的数据拟合模型。这个 c 的取值范围在 0 到 1 之间,一个能给出较高 c 值的建树方法被认为是好的。我们的计算机模拟表明,NJp 方法通常比其他方法表现更好,因此无论基因是否具有组成性,是否包含镶嵌 DNA 区域,一般都应该使用这种方法。