Program in Bioinformatics and Computational Biology, Department of Biological Sciences, University of Idaho, PO Box 443051, Moscow, ID 83844-3051, USA.
Syst Biol. 2012 Jan;61(1):12-21. doi: 10.1093/sysbio/syr093. Epub 2011 Aug 26.
The rapidly growing availability of multigene sequence data during the past decade has enabled phylogeny estimation at phylogenomic scales. However, dealing with evolutionary process heterogeneity across the genome becomes increasingly challenging. Here we develop a mixture model approach that uses reversible jump Markov chain Monte Carlo (MCMC) estimation to permit as many distinct models as the data require. Each additional model considered may be a fully parametrized general time-reversible model or any of its special cases. Furthermore, we expand the usual proposal mechanisms for topology changes to permit hard polytomies (i.e., zero-length internal branches). This new approach is implemented in the Crux software toolkit. We demonstrate the feasibility of using reversible jump MCMC on mixture models by reexamining a well-known 44-taxon mammalian data set comprising 22 concatenated genes. We are able to reproduce the results of the original analysis (with respect to bipartition support) when we make identical assumptions, but when we allow for polytomies and/or use data-driven mixture model estimation, we infer much lower bipartition support values for several key bipartitions.
在过去十年中,随着多基因序列数据的快速增长,已经能够在系统基因组学尺度上进行系统发育估计。然而,处理整个基因组中进化过程的异质性变得越来越具有挑战性。在这里,我们开发了一种混合模型方法,该方法使用可逆跳跃马尔可夫链蒙特卡罗(MCMC)估计来允许数据所需的尽可能多的不同模型。考虑的每个附加模型可以是完全参数化的一般时间可逆模型或其任何特例。此外,我们扩展了通常的拓扑结构变化建议机制,以允许硬聚结(即,零长度内部分支)。这种新方法在 Crux 软件工具包中实现。我们通过重新检查由 22 个串联基因组成的 44 个分类单元哺乳动物数据集来演示在混合模型上使用可逆跳跃 MCMC 的可行性。当我们做出相同的假设时,我们能够重现原始分析的结果(关于二分支持),但是当我们允许聚结和/或使用数据驱动的混合模型估计时,我们会推断出几个关键二分支持值的二分支持值要低得多。