Yang Z
Department of Zoology, Natural History Museum, London, United Kingdom.
J Mol Evol. 1994 Sep;39(3):306-14. doi: 10.1007/BF00160154.
Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called the "discrete gamma model," uses several categories of rates to approximate the gamma distribution, with equal probability for each category. The mean of each category is used to represent all the rates falling in the category. The performance of this method is found to be quite good, and four such categories appear to be sufficient to produce both an optimum, or near-optimum fit by the model to the data, and also an acceptable approximation to the continuous distribution. The second method, called "fixed-rates model", classifies sites into several classes according to their rates predicted assuming the star tree. Sites in different classes are then assumed to be evolving at these fixed rates when other tree topologies are evaluated. Analyses of the data sets suggest that this method can produce reasonable results, but it seems to share some properties of a least-squares pairwise comparison; for example, interior branch lengths in nonbest trees are often found to be zero. The computational requirements of the two methods are comparable to that of Felsenstein's (1981, J Mol Evol 17:368-376) model, which assumes a single rate for all the sites.
本文提出了两种近似方法用于最大似然系统发育估计,这两种方法允许核苷酸位点间的替换率可变。分析了三个具有相当不同特征的数据集,以实证检验这些方法的性能。第一种方法称为“离散伽马模型”,它使用几类速率来近似伽马分布,每类速率的概率相等。每类速率的均值用于代表该类中的所有速率。结果发现该方法的性能相当好,四个这样的类别似乎足以使模型对数据产生最优或接近最优的拟合,同时也能对连续分布给出可接受的近似。第二种方法称为“固定速率模型”,根据假设星状树预测的速率将位点分类为几个类别。在评估其他树形拓扑时,假设不同类别的位点以这些固定速率进化。对数据集的分析表明,该方法可以产生合理的结果,但它似乎具有一些最小二乘成对比较的特性;例如,在非最优树中,内部支长常常为零。这两种方法的计算要求与费尔斯滕森(1981年,《分子进化杂志》17:368 - 376)模型的计算要求相当,该模型假设所有位点的替换率相同。