Beichman Annabel C, Phung Tanya N, Lohmueller Kirk E
Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095.
Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California 90095.
G3 (Bethesda). 2017 Nov 6;7(11):3605-3620. doi: 10.1534/g3.117.300259.
Inference of demographic history from genetic data is a primary goal of population genetics of model and nonmodel organisms. Whole genome-based approaches such as the pairwise/multiple sequentially Markovian coalescent methods use genomic data from one to four individuals to infer the demographic history of an entire population, while site frequency spectrum (SFS)-based methods use the distribution of allele frequencies in a sample to reconstruct the same historical events. Although both methods are extensively used in empirical studies and perform well on data simulated under simple models, there have been only limited comparisons of them in more complex and realistic settings. Here we use published demographic models based on data from three human populations (Yoruba, descendants of northwest-Europeans, and Han Chinese) as an empirical test case to study the behavior of both inference procedures. We find that several of the demographic histories inferred by the whole genome-based methods do not predict the genome-wide distribution of heterozygosity, nor do they predict the empirical SFS. However, using simulated data, we also find that the whole genome methods can reconstruct the complex demographic models inferred by SFS-based methods, suggesting that the discordant patterns of genetic variation are not attributable to a lack of statistical power, but may reflect unmodeled complexities in the underlying demography. More generally, our findings indicate that demographic inference from a small number of genomes, routine in genomic studies of nonmodel organisms, should be interpreted cautiously, as these models cannot recapitulate other summaries of the data.
从遗传数据推断种群历史是模式生物和非模式生物群体遗传学的主要目标。基于全基因组的方法,如成对/多重序列马尔可夫合并方法,使用一至四个个体的基因组数据来推断整个种群的历史,而基于位点频率谱(SFS)的方法则使用样本中等位基因频率的分布来重建相同的历史事件。尽管这两种方法都在实证研究中被广泛使用,并且在简单模型下模拟的数据上表现良好,但在更复杂和现实的情况下,对它们的比较却很有限。在这里,我们使用基于三个人类群体(约鲁巴人、西北欧后裔和汉族)数据的已发表的种群模型作为实证测试案例,来研究这两种推断方法的行为。我们发现,基于全基因组的方法推断出的几种种群历史既不能预测全基因组杂合度的分布,也不能预测实证位点频率谱。然而,通过模拟数据,我们还发现全基因组方法可以重建基于SFS的方法推断出的复杂种群模型,这表明遗传变异的不一致模式并非归因于统计能力的不足,而是可能反映了潜在种群统计学中未建模的复杂性。更普遍地说,我们的研究结果表明,在非模式生物的基因组研究中常见的从少数基因组进行种群推断应该谨慎解释,因为这些模型无法概括数据的其他总结。