Felsenstein J
Department of Genetics, University of Washington, Box 357360, Seattle, WA 98195-7360, USA.
J Mol Evol. 2001 Oct-Nov;53(4-5):447-55. doi: 10.1007/s002390010234.
As methods of molecular phylogeny have become more explicit and more biologically realistic following the pioneering work of Thomas Jukes, they have had to relax their initial assumption that rates of evolution were equal at all sites. Distance matrix and likelihood methods of inferring phylogenies make this assumption; parsimony, when valid, is less limited by it. Nucleotide sequences, including RNA sequences, can show substantial rate variation; protein sequences show rates that vary much more widely. Assuming a prior distribution of rates such as a gamma distribution or lognormal distribution has deservedly been popular, but for likelihood methods it leads to computational difficulties. These can be resolved using hidden Markov model (HMM) methods which approximate the distribution by one with a modest number of discrete rates. Generalized Laguerre quadrature can be used to improve the selection of rates and their probabilities so as to more nearly approach the desired gamma distribution. A model based on population genetics is presented predicting how the rates of evolution might vary from locus to locus. Challenges for the future include allowing rates at a given site to vary along the tree, as in the "covarion" model, and allowing them to have correlations that reflect three-dimensional structure, rather than position in the coding sequence. Markov chain Monte Carlo likelihood methods may be the only practical way to carry out computations for these models.
随着分子系统发育方法在托马斯·朱克斯开创性工作的基础上变得更加明确且更符合生物学实际情况,它们不得不放宽其最初的假设,即所有位点的进化速率是相等的。推断系统发育的距离矩阵法和似然法都做了这一假设;简约法在有效的情况下,受此限制较少。核苷酸序列,包括RNA序列,可能会表现出显著的速率变化;蛋白质序列的速率变化范围则大得多。假设速率的先验分布,如伽马分布或对数正态分布,一直很受欢迎,但对于似然法而言,这会导致计算困难。这些困难可以使用隐马尔可夫模型(HMM)方法来解决,该方法用数量适中的离散速率分布来近似原分布。广义拉盖尔求积法可用于改进速率及其概率的选择,以便更接近所需的伽马分布。提出了一个基于群体遗传学的模型,预测进化速率如何可能因基因座而异。未来的挑战包括允许给定位点的速率沿树状结构变化,如在“协变子”模型中那样,以及允许它们具有反映三维结构而非编码序列中位置的相关性。马尔可夫链蒙特卡罗似然法可能是对这些模型进行计算的唯一实用方法。