Chung Yujin, Hey Jody
Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA.
Department of Biology, Temple University, Philadelphia, PA.
Mol Biol Evol. 2017 Jun 1;34(6):1517-1528. doi: 10.1093/molbev/msx070.
We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation-with-Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method's accuracy, scalability, and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes (P. t.) troglodytes and P. t. verus.
我们提出了一种利用群体基因组数据估计种群统计学和系统发育历史的新贝叶斯方法。引入了几个关键创新点,使得能够在“隔离-迁移”框架内研究多种模型。新方法实施两步分析,第一步是马尔可夫链蒙特卡罗(MCMC)阶段,对简单的合并树进行采样,然后计算种群统计学模型参数的联合后验密度。在第一步MCMC采样阶段,该方法使用简化的状态空间,由没有迁移路径的合并树组成,并使用一个没有目标种群统计学的简单重要性采样分布。一旦获得,单个树样本可用于第二步,以计算多种不同种群统计学模型下模型参数的联合后验密度,而无需重复MCMC运行。由于迁移路径不包含在MCMC阶段的状态空间中,而是在分析的第二步通过解析积分处理,该方法可扩展到大量位点,具有出色的MCMC混合特性。通过在计算机程序MIST中实现新方法,我们使用模拟数据以及两种普通黑猩猩亚种——黑猩猩指名亚种(P. t. troglodytes)和黑猩猩西部亚种(P. t. verus)的DNA序列,证明了该方法的准确性、可扩展性及其他优势。