Institute of Ecology Evolution, Department of Biology, University of Oregon, Eugene, Oregon
Institute of Ecology Evolution, Department of Biology, University of Oregon, Eugene, Oregon.
Genetics. 2020 May;215(1):193-214. doi: 10.1534/genetics.120.303143. Epub 2020 Mar 24.
Real geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result, many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here, we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies (GWAS). We find that most common summary statistics have distributions that differ substantially from those seen in well-mixed populations, especially when Wright's neighborhood size is < 100 and sampling is spatially clustered. "Stepping-stone" models reproduce some of these effects, but discretizing the landscape introduces artifacts that in some cases are exacerbated at higher resolutions. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations revealed surprisingly little systematic bias. We also show that the combination of spatially autocorrelated environments and limited dispersal causes GWAS to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.
真实的地理环境是连续的,但人口遗传学中的标准模型则基于离散的、充分混合的种群。因此,许多分析遗传数据的方法都假设样本是从充分混合的种群中随机抽取的,但这些方法却被应用于来自具有地理上结构性聚类的种群的样本。在这里,我们使用生活在连续地理环境中的种群的模拟来研究扩散和抽样策略对群体遗传汇总统计、人口统计推断和全基因组关联研究(GWAS)的影响。我们发现,大多数常见的汇总统计量的分布与在充分混合的种群中看到的分布有很大的不同,特别是当 Wright 的邻居大小<100 且抽样是空间聚类时。“踏脚石”模型再现了其中的一些效应,但离散化景观会引入一些伪影,在某些情况下,在更高的分辨率下会加剧这些伪影。低扩散和聚类抽样的组合导致从位点频率谱推断人口统计推断出更动荡的人口历史,但对多个模拟的平均结果显示,系统偏差很小。我们还表明,空间自相关环境和有限扩散的组合导致 GWAS 识别出与纯环境决定表型遗传关联的虚假信号,并且这种偏差仅通过回归祖先的主成分部分纠正。最后,我们讨论了我们的模拟结果对从真实生物体中的遗传变异进行推断的相关性。