Nei M, Tajima F, Tateno Y
J Mol Evol. 1983;19(2):153-70. doi: 10.1007/BF02300753.
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.
通过计算机模拟检验了三种基于基因频率数据构建系统发育树的不同方法的准确性和效率。所检验的方法有UPGMA法、法里斯(1972年)的方法以及立野等人(1982年)改进的法里斯方法。在计算机模拟中,假定八个物种(或种群)按照给定的模型树进化,并使用无限等位基因模型追踪等位基因频率的进化变化。在模拟进化结束时,计算所有物种对之间的五种遗传距离度量(内氏标准距离和最小距离、罗杰斯距离、卡瓦利 - 斯福尔扎的fθ以及改进的卡瓦利 - 斯福尔扎距离),并将为每种距离度量获得的距离矩阵用于重建系统发育树。然后将得到的系统发育树与模型树进行比较。所得到的结果表明,在所有检验的建树方法中,当所用基因座数量少于20个时,重建树(有根树)的拓扑结构和分支长度的准确性都非常低,但随着基因座数量的增加而逐渐提高。当最短分支的预期基因替代数(M)每个基因座为0.1或更多且使用30个或更多基因座时,用畸变指数(dT)衡量的拓扑误差不大,但即使使用60个基因座,获得正确拓扑结构的概率(P)也小于0.5。当M小至0.004时,P则更低。在获得良好的拓扑结构(小dT和高P)方面,UPGMA法和改进的法里斯方法通常比法里斯方法表现更好。即使使用服从三角不等式的罗杰斯距离时,也观察到法里斯方法的表现不佳。其主要原因似乎是法里斯方法常常高估分支长度。对于估计真实树的预期分支长度,UPGMA法表现最佳。为此,内氏标准距离由于其与基因替代数的线性关系,比其他距离给出更好的结果。罗杰斯或卡瓦利 - 斯福尔扎距离给出的系统发育树中,根部附近的部分压缩,其他部分拉长。建议使用30个以上的基因座(包括多态和单态基因座)来构建系统发育树。本研究的结论似乎也适用于通过限制性酶技术获得的核苷酸差异数据。