Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland;
Department of Integrative Biology, University of California, Berkeley, CA 94720.
Proc Natl Acad Sci U S A. 2019 Mar 12;116(11):5027-5036. doi: 10.1073/pnas.1813836116. Epub 2019 Feb 26.
Patterns of molecular coevolution can reveal structural and functional constraints within or among organic molecules. These patterns are better understood when considering the underlying evolutionary process, which enables us to disentangle the signal of the dependent evolution of sites (coevolution) from the effects of shared ancestry of genes. Conversely, disregarding the dependent evolution of sites when studying the history of genes negatively impacts the accuracy of the inferred phylogenetic trees. Although molecular coevolution and phylogenetic history are interdependent, analyses of the two processes are conducted separately, a choice dictated by computational convenience, but at the expense of accuracy. We present a Bayesian method and associated software to infer how many and which sites of an alignment evolve according to an independent or a pairwise dependent evolutionary process, and to simultaneously estimate the phylogenetic relationships among sequences. We validate our method on synthetic datasets and challenge our predictions of coevolution on the 16S rRNA molecule by comparing them with its known molecular structure. Finally, we assess the accuracy of phylogenetic trees inferred under the assumption of independence among sites using synthetic datasets, the 16S rRNA molecule and 10 additional alignments of protein-coding genes of eukaryotes. Our results demonstrate that inferring phylogenetic trees while accounting for dependent site evolution significantly impacts the estimates of the phylogeny and the evolutionary process.
分子协同进化模式可以揭示有机分子内部或分子之间的结构和功能约束。在考虑潜在的进化过程时,这些模式更容易理解,这使我们能够将位点的依赖进化(协同进化)的信号与基因共同祖先的影响区分开来。相反,在研究基因的历史时忽略位点的依赖进化会降低推断出的系统发育树的准确性。尽管分子协同进化和系统发育历史是相互依存的,但对这两个过程的分析是分开进行的,这是出于计算方便的选择,但代价是准确性。我们提出了一种贝叶斯方法和相关软件,以推断比对中根据独立或成对依赖进化过程进化的位点的数量和位置,并同时估计序列之间的系统发育关系。我们在合成数据集上验证了我们的方法,并通过将其与已知的分子结构进行比较,对 16S rRNA 分子的协同进化预测进行了挑战。最后,我们使用合成数据集、16S rRNA 分子和 10 个额外的真核生物编码蛋白基因比对评估了在假定位点之间独立的情况下推断出的系统发育树的准确性。我们的结果表明,在考虑依赖进化的位点的情况下推断系统发育树会显著影响系统发育和进化过程的估计。