Pagel Mark, Meade Andrew
School of Biological Sciences, University of Reading, Lyle Building, Whiteknights, Reading RG6 6AJ, UK.
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3955-64. doi: 10.1098/rstb.2008.0178.
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.
基因序列比对中给定位点随时间的进化速率可能会有所不同。这种现象——称为异速进化——可能会使从假设进化速率恒定的序列进化模型推断出的系统发育树产生偏差或扭曲。在这里,我们描述了一种旨在适应异速进化的系统发育混合模型。该方法在同一树形拓扑结构上的多组分支长度上对每个位点的数据似然性进行求和。对一个位点最合适的分支长度集可能与对其他位点最合适的分支长度集不同,从而允许不同位点在整个树中具有不同的变化速率。由于并非所有分支中都可能存在速率变化,我们使用可逆跳跃马尔可夫链蒙特卡罗算法来识别那些存在可靠数量异速进化的分支。我们将该方法与我们的“模式异质性”混合模型结合实施,并将其应用于模拟数据和五个已发表的数据集。我们发现,异速进化的复杂进化信号通常存在于位点间进化速率或模式变化之上,可逆跳跃方法比传统混合模型需要更少的参数来描述它,并且有助于识别树中异速进化最明显的区域。可逆跳跃过程也无需进行诸如赤池或贝叶斯信息准则检验或贝叶斯因子等“显著性”的后验检验。异速进化对于正确重建系统发育以及依赖准确分支长度信息的假设检验具有重要影响。这些包括分子钟、进化节奏和模式分析、比较研究以及祖先状态重建。该模型可从作者网站获取,可用于分析核苷酸和形态数据。