Tateno Y, Takezaki N, Nei M
National Institute of Genetics, Mishima.
Mol Biol Evol. 1994 Mar;11(2):261-77. doi: 10.1093/oxfordjournals.molbev.a040108.
The relative efficiencies of the maximum-likelihood (ML), neighbor-joining (NJ), and maximum-parsimony (MP) methods in obtaining the correct topology and in estimating the branch lengths for the case of four DNA sequences were studied by computer simulation, under the assumption either that there is variation in substitution rate among different nucleotide sites or that there is no variation. For the NJ method, several different distance measures (Jukes-Cantor, Kimura two-parameter, and gamma distances) were used, whereas for the ML method three different transition/transversion ratios (R) were used. For the MP method, both the standard unweighted parsimony and the dynamically weighted parsimony methods were used. The results obtained are as follows: (1) When the R value is high, dynamically weighted parsimony is more efficient than unweighted parsimony in obtaining the correct topology. (2) However, both weighted and unweighted parsimony methods are generally less efficient than the NJ and ML methods even in the case where the MP method gives a consistent tree. (3) When all the assumptions of the ML method are satisfied, this method is slightly more efficient than the NJ method. However, when the assumptions are not satisfied, the NJ method with gamma distances is slightly better in obtaining the correct topology than is the ML method. In general, the two methods show more or less the same performance. The NJ method may give a correct topology even when the distance measures used are not unbiased estimators of nucleotide substitutions. (4) Branch length estimates of a tree with the correct topology are affected more easily than topology by violation of the assumptions of the mathematical model used, for both the ML and the NJ methods. Under certain conditions, branch lengths are seriously overestimated or underestimated. The MP method often gives serious underestimates for certain branches. (5) Distance measures that generate the correct topology, with high probability, do not necessarily give good estimates of branch lengths. (6) The likelihood-ratio test and the confidence-limit test, in Felsenstein's DNAML, for examining the statistical of branch length estimates are quite sensitive to violation of the assumptions and are generally too liberal to be used for actual data. Rzhetsky and Nei's branch length test is less sensitive to violation of the assumptions than is Felsenstein's test. (7) When the extent of sequence divergence is < or = 5% and when > or = 1,000 nucleotides are used, all three methods show essentially the same efficiency in obtaining the correct topology and in estimating branch lengths.(ABSTRACT TRUNCATED AT 400 WORDS)
通过计算机模拟研究了最大似然法(ML)、邻接法(NJ)和最大简约法(MP)在获取正确拓扑结构以及估计四个DNA序列的分支长度方面的相对效率,模拟假设了两种情况,即不同核苷酸位点的替换率存在差异或不存在差异。对于NJ法,使用了几种不同的距离度量(Jukes-Cantor、Kimura双参数和伽马距离),而对于ML法,使用了三种不同的转换/颠换比(R)。对于MP法,同时使用了标准的非加权简约法和动态加权简约法。得到的结果如下:(1)当R值较高时,动态加权简约法在获取正确拓扑结构方面比非加权简约法更有效。(2)然而,即使在MP法给出一致树的情况下,加权和非加权简约法通常也比NJ法和ML法效率低。(3)当ML法的所有假设都满足时,该方法比NJ法略有效。然而,当假设不满足时,使用伽马距离的NJ法在获取正确拓扑结构方面比ML法略好。一般来说,这两种方法表现出或多或少相同的性能。即使所使用的距离度量不是核苷酸替换的无偏估计量,NJ法也可能给出正确的拓扑结构。(4)对于ML法和NJ法,具有正确拓扑结构的树的分支长度估计比拓扑结构更容易受到违反所用数学模型假设的影响。在某些条件下,分支长度会被严重高估或低估。MP法经常会严重低估某些分支的长度。(5)以高概率生成正确拓扑结构的距离度量不一定能很好地估计分支长度。(6)Felsenstein的DNAML中用于检验分支长度估计统计量的似然比检验和置信限检验对违反假设非常敏感,通常过于宽松而不能用于实际数据。Rzhetsky和Nei的分支长度检验比Felsenstein的检验对违反假设的敏感性更低。(7)当序列分歧程度≤5%且使用≥1000个核苷酸时,所有三种方法在获取正确拓扑结构和估计分支长度方面表现出基本相同的效率。(摘要截于400字)