Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation.
Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, 3001 Connecticut Ave., NW Washington, D.C. 20008, USA.
Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa005.
The demographic history of any population is imprinted in the genomes of the individuals that make up the population. One of the most popular and convenient representations of genetic information is the allele frequency spectrum (AFS), the distribution of allele frequencies in populations. The joint AFS is commonly used to reconstruct the demographic history of multiple populations, and several methods based on diffusion approximation (e.g., ∂a∂i) and ordinary differential equations (e.g., moments) have been developed and applied for demographic inference. These methods provide an opportunity to simulate AFS under a variety of researcher-specified demographic models and to estimate the best model and associated parameters using likelihood-based local optimizations. However, there are no known algorithms to perform global searches of demographic models with a given AFS.
Here, we introduce a new method that implements a global search using a genetic algorithm for the automatic and unsupervised inference of demographic history from joint AFS data. Our method is implemented in the software GADMA (Genetic Algorithm for Demographic Model Analysis, https://github.com/ctlab/GADMA).
We demonstrate the performance of GADMA by applying it to sequence data from humans and non-model organisms and show that it is able to automatically infer a demographic model close to or even better than the one that was previously obtained manually. Moreover, GADMA is able to infer multiple demographic models at different local optima close to the global one, providing a larger set of possible scenarios to further explore demographic history.
任何人群的人口历史都铭刻在构成该人群的个体的基因组中。遗传信息最受欢迎和方便的表示之一是等位基因频率谱(AFS),即群体中等位基因频率的分布。联合 AFS 通常用于重建多个群体的人口历史,并且已经开发并应用了几种基于扩散逼近(例如,∂a∂i)和常微分方程(例如,矩)的方法进行人口推断。这些方法为在各种研究人员指定的人口模型下模拟 AFS 提供了机会,并使用基于似然的局部优化来估计最佳模型和相关参数。然而,目前还没有已知的算法可以针对具有给定 AFS 的人口模型进行全局搜索。
在这里,我们介绍了一种新方法,该方法使用遗传算法实现了全局搜索,用于从联合 AFS 数据自动和无监督地推断人口历史。我们的方法在软件 GADMA(用于人口模型分析的遗传算法,https://github.com/ctlab/GADMA)中实现。
我们通过将其应用于来自人类和非模型生物的序列数据来展示 GADMA 的性能,并表明它能够自动推断出接近甚至优于先前手动获得的人口模型。此外,GADMA 能够推断出多个接近全局最优的不同局部最优的人口模型,提供了更大的可能情景集来进一步探索人口历史。