INRA, UMR CBGP (INRA - IRD - Cirad - Montpellier SupAgro), Campus international de Baillarguet, Montferrier-sur-Lez, France.
Mol Biol Evol. 2013 Mar;30(3):654-68. doi: 10.1093/molbev/mss257. Epub 2012 Nov 15.
The recent development of high-throughput genotyping technologies has revolutionized the collection of data in a wide range of both model and nonmodel species. These data generally contain huge amounts of information about the demographic history of populations. In this study, we introduce a new method to estimate divergence times on a diffusion time scale from large single-nucleotide polymorphism (SNP) data sets, conditionally on a population history that is represented as a tree. We further assume that all the observed polymorphisms originate from the most ancestral (root) population; that is, we neglect mutations that occur after the split of the most ancestral population. This method relies on a hierarchical Bayesian model, based on Kimura's time-dependent diffusion approximation of genetic drift. We implemented a Metropolis-Hastings within Gibbs sampler to estimate the posterior distribution of the parameters of interest in this model, which we refer to as the Kimura model. Evaluating the Kimura model on simulated population histories, we found that it provides accurate estimates of divergence time. Assessing model fit using the deviance information criterion (DIC) proved efficient for retrieving the correct tree topology among a set of competing histories. We show that this procedure is robust to low-to-moderate gene flow, as well as to ascertainment bias, providing that the most distantly related populations are represented in the discovery panel. As an illustrative example, we finally analyzed published human data consisting in genotypes for 452,198 SNPs from individuals belonging to four populations worldwide. Our results suggest that the Kimura model may be helpful to characterize the demographic history of differentiated populations, using genome-wide allele frequency data.
高通量基因分型技术的最新发展彻底改变了模型和非模型物种的广泛数据收集。这些数据通常包含有关种群历史的大量信息。在这项研究中,我们介绍了一种新的方法,可根据代表种群历史的树,在扩散时间尺度上从大型单核苷酸多态性(SNP)数据集估算分歧时间。我们进一步假设所有观察到的多态性都源自最原始的(根)种群;也就是说,我们忽略了在最原始种群分裂之后发生的突变。该方法依赖于基于 Kimura 遗传漂变的时变扩散逼近的分层贝叶斯模型。我们实现了一种基于 Metropolis-Hastings 的 Gibbs 抽样器来估计该模型中感兴趣参数的后验分布,我们将其称为 Kimura 模型。在模拟的种群历史上评估 Kimura 模型,我们发现它可以准确估计分歧时间。使用偏差信息准则(DIC)评估模型拟合度可以有效地从一组竞争历史中检索正确的树拓扑。我们表明,该程序对于低至中等基因流以及确定偏差具有鲁棒性,只要发现面板中包含最遥远相关的种群。作为一个说明性示例,我们最后分析了来自全世界四个种群的 452,198 个 SNP 基因型的已发表人类数据。我们的结果表明,Kimura 模型可能有助于使用全基因组等位基因频率数据描述分化种群的人口历史。