Department of Statistics, Harvard University, Cambridge, MA.
Informatics Group, Harvard University, Cambridge, MA.
Mol Biol Evol. 2019 May 1;36(5):1086-1100. doi: 10.1093/molbev/msz049.
Conservation of DNA sequence over evolutionary time is a strong indicator of function, and gain or loss of sequence conservation can be used to infer changes in function across a phylogeny. Changes in evolutionary rates on particular lineages in a phylogeny can indicate shared functional shifts, and thus can be used to detect genomic correlates of phenotypic convergence. However, existing methods do not allow easy detection of patterns of rate variation, which causes challenges for detecting convergent rate shifts or other complex evolutionary scenarios. Here we introduce PhyloAcc, a new Bayesian method to model substitution rate changes in conserved elements across a phylogeny. The method assumes several categories of substitution rate for each branch on the phylogenetic tree, estimates substitution rates per category, and detects changes of substitution rate as the posterior probability of a category switch. Simulations show that PhyloAcc can detect genomic regions with rate shifts in multiple target species better than previous methods and has a higher accuracy of reconstructing complex patterns of substitution rate changes than prevalent Bayesian relaxed clock models. We demonstrate the utility of PhyloAcc in two classic examples of convergent phenotypes: loss of flight in birds and the transition to marine life in mammals. In each case, our approach reveals numerous examples of conserved nonexonic elements with accelerations specific to the phenotypically convergent lineages. Our method is widely applicable to any set of conserved elements where multiple rate changes are expected on a phylogeny.
在进化过程中,DNA 序列的保守性是功能的有力指标,序列保守性的增加或减少可以用来推断功能在系统发育中的变化。系统发育中特定谱系的进化率变化可以表明功能的共享变化,因此可以用于检测表型趋同的基因组相关性。然而,现有的方法不能轻易地检测到速率变化的模式,这给检测趋同的速率变化或其他复杂的进化场景带来了挑战。在这里,我们引入了 PhyloAcc,这是一种新的贝叶斯方法,可以对系统发育树上保守元素的替代率变化进行建模。该方法假设系统发育树上每个分支有几个替代率类别,估计每个类别的替代率,并通过类别转换的后验概率来检测替代率的变化。模拟表明,PhyloAcc 可以比以前的方法更好地检测多个目标物种中具有速率变化的基因组区域,并且比流行的贝叶斯松弛时钟模型具有更高的重建复杂替代率变化模式的准确性。我们在两个趋同表型的经典例子中展示了 PhyloAcc 的实用性:鸟类飞行能力的丧失和哺乳动物向海洋生活的过渡。在每种情况下,我们的方法都揭示了许多具有特定于表型趋同谱系的加速的保守非编码元件的例子。我们的方法广泛适用于任何一组保守元件,这些元件在系统发育上预计会有多个速率变化。