Kamm Jack, Terhorst Jonathan, Durbin Richard, Song Yun S
Wellcome Sanger Institute, Hinxton, Cambridge, UK.
Department of Genetics, University of Cambridge, Cambridge, UK.
J Am Stat Assoc. 2020;115(531):1472-1487. doi: 10.1080/01621459.2019.1635482. Epub 2019 Jul 22.
The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than p reviously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed "basal Eurasian" admixture event in human history. We implement and release our method in a new open-source software package momi2.
样本频率谱(SFS),即等位基因计数的直方图,是进化生物学中的一个重要汇总统计量,常用于推断种群大小变化、迁移以及影响一组种群的其他人口统计学事件的历史。当模型中的种群通过一棵树相关联时,在给定的人口统计学模型下,可以高效地计算预期的多群体SFS,扩展到数百个种群。然而,混合、回交和基因渗入是常见的自然过程,它们违反了树状种群历史的假设,并且到目前为止,当人口统计学历史不是树状时,对于少数几个种群才能计算预期的SFS。在本文中,我们提出了一种新方法,用于高效计算由一般有向无环图描述的人口统计学的预期SFS及其线性泛函。该方法可以扩展到比以前更多的种群,适用于包括混合在内的复杂人口统计学历史。我们将我们的方法应用于一个8种群的SFS,以估计人类历史中一个提议的“基础欧亚人”混合事件的时间和强度。我们在一个新的开源软件包momi2中实现并发布了我们的方法。