Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
Mol Biol Evol. 2011 Jan;28(1):673-83. doi: 10.1093/molbev/msq236. Epub 2010 Sep 6.
Population genetics encompasses a strong theoretical and applied research tradition on the multiple demographic processes that shape genetic variation present within a species. When several distinct populations exist in the current generation, it is often natural to consider the pattern of their divergence from a single ancestral population in terms of a binary tree structure. Inference about such population histories based on molecular data has been an intensive research topic in the recent years. The most common approach uses coalescent theory to model genealogies of individuals sampled from the current populations. Such methods are able to compare several different evolutionary scenarios and to estimate demographic parameters. However, their major limitation is the enormous computational complexity associated with the indirect modeling of the demographies, which limits the application to small data sets. Here, we propose a novel Bayesian method for inferring population histories from unlinked single nucleotide polymorphisms, which is applicable also to data sets harboring large numbers of individuals from distinct populations. We use an approximation to the neutral Wright-Fisher diffusion to model random fluctuations in allele frequencies. The population histories are modeled as binary rooted trees that represent the historical order of divergence of the different populations. A combination of analytical, numerical, and Monte Carlo integration techniques are utilized for the inferences. A particularly important feature of our approach is that it provides intuitive measures of statistical uncertainty related with the estimates computed, which may be entirely lacking for the alternative methods in this context. The potential of our approach is illustrated by analyses of both simulated and real data sets.
群体遗传学包含一个强大的理论和应用研究传统,研究了多种塑造物种内遗传变异的人口统计过程。当当前代存在几个不同的种群时,通常很自然地会根据二叉树结构来考虑它们从单一祖先种群分化的模式。基于分子数据推断这些种群历史是近年来的一个热门研究课题。最常见的方法是使用合并理论来模拟从当前种群中抽样的个体的系统发育。这些方法能够比较几种不同的进化场景,并估计人口统计学参数。然而,它们的主要局限性是与人口间接建模相关的巨大计算复杂性,这限制了它们在小数据集上的应用。在这里,我们提出了一种从非连锁单核苷酸多态性推断种群历史的新贝叶斯方法,该方法也适用于包含来自不同种群的大量个体的数据集。我们使用中性 Wright-Fisher 扩散的近似值来模拟等位基因频率的随机波动。种群历史被建模为二进制有根树,代表不同种群的历史分歧顺序。分析、数值和蒙特卡罗积分技术的组合用于推理。我们方法的一个特别重要的特点是,它提供了与计算得出的估计值相关的统计不确定性的直观度量,而在这种情况下,替代方法可能完全缺乏这些度量。我们的方法的潜力通过对模拟和真实数据集的分析得到了说明。