Computer Science Division, University of California, Berkeley, California 94720, USA.
Genetics. 2013 Jul;194(3):647-62. doi: 10.1534/genetics.112.149096. Epub 2013 Apr 22.
Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, and thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called the pairwise sequentially Markovian coalescent (PSMC), for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal, to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times.
纵观历史,由于环境、文化和技术的变化,现代人类的人口规模发生了很大的变化。更准确地估计人口规模的变化,以及它们发生的时间,应该能更清楚地了解人类的殖民历史,并有助于消除自然选择推断中的混杂效应。人口统计学影响着一个群体中遗传变异的模式,因此,从一个或多个现代群体中抽取的多个个体的基因组数据包含了有关过去人口历史的宝贵信息。最近,Li 和 Durbin 开发了一种基于合并的隐马尔可夫模型,称为成对顺序马尔可夫合并(PSMC),用于估计一对染色体(或一个二倍体个体)的过去人口规模。这是一种高效、有用的方法,但由于样本量小,在最近的过去,只有少数合并事件发生,因此其准确性受到阻碍。来自同一群体的多个基因组包含了更多关于最近过去的信息,但在合并框架中联合研究也更具计算挑战性。在这里,我们提出了一种新的基于合并的方法,可以有效地从多个基因组中推断出人口规模的变化,从而获得有关最近过去的新信息。我们的工作推广了最近开发的顺序马尔可夫条件抽样分布框架,该框架提供了一种准确的近似值,用于给定一组先前抽样的单倍型来观察新抽样的单倍型的概率。模拟结果表明,我们可以准确地重建真实的人口历史,在最近的过去,与 PSMC 相比有显著的改进。我们将我们的方法称为 diCal,应用于欧洲和非洲血统的多个人类个体的基因组,以获得最近时期详细的人口规模变化历史。