Boitard Simon, Rodríguez Willy, Jay Flora, Mona Stefano, Austerlitz Frédéric
Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 - CNRS & MNHN & UPMC & EPHE, Ecole Pratique des Hautes Etudes, Sorbonne Universités, Paris, France.
GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France.
PLoS Genet. 2016 Mar 4;12(3):e1005877. doi: 10.1371/journal.pgen.1005877. eCollection 2016 Mar.
Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.
推断有效种群大小的祖先动态是群体遗传学中一个长期存在的问题,由于许多物种中可获得大量基因组数据,现在可以更准确地解决这个问题。在这种背景下,最近已经开发出了几种利用全基因组序列的有前景的方法。然而,它们只能应用于相当小的样本,这限制了它们估计近期种群大小历史的能力。此外,它们可能对测序或定相错误非常敏感。在这里,我们介绍一种名为PopSizeABC的新的近似贝叶斯计算方法,该方法允许使用大量完整基因组样本估计有效种群大小随时间的演变。使用折叠等位基因频率谱和物理距离不同区间的平均合子连锁不平衡对该样本进行总结,这两类统计量在群体遗传学中广泛使用,并且可以从未分型和未极化的SNP数据中轻松计算出来。我们的方法能够准确估计过去的种群大小,从当前之前的最初几代一直追溯到样本最近共同祖先的预期时间,在广泛的人口统计场景下的模拟结果证明了这一点。当应用于四个牛品种(安格斯、弗莱维赫、荷斯坦和泽西)的15个或25个完整基因组样本时,PopSizeABC揭示了一系列与驯化或现代品种创建等历史事件相关的种群下降情况。我们进一步强调,只要从具有常见等位基因的SNP计算汇总统计量,我们的方法对测序错误具有鲁棒性。