Schrider Daniel R, Kern Andrew D
Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America.
Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America.
PLoS Genet. 2016 Mar 15;12(3):e1005928. doi: 10.1371/journal.pgen.1005928. eCollection 2016 Mar.
Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.
从全基因组测序数据中检测适应性自然选择的目标是群体遗传学的核心问题。然而,迄今为止,大多数方法在现实的人口统计场景下表现并不理想。此外,在过去十年中,人们重新关注确定自然群体适应性中现有变异选择的重要性,但在基因组规模上推断这种适应模型的方法却很少。在此,我们介绍一种新方法S/HIC,它使用监督机器学习来精确推断硬选择清除和软选择清除的位置。我们表明,S/HIC在检测与人类群体相关的人口统计历史下的选择清除以及区分选择清除与连锁和中性进化区域方面具有无与伦比的准确性。此外,我们表明,在其竞争对手中,S/HIC对模型误设具有独特的鲁棒性。因此,即使一个群体的真实人口统计模型与用户指定的模型有巨大差异,S/HIC仍然具有令人印象深刻的辨别能力。最后,我们将S/HIC应用于欧洲人群样本中人类18号染色体重测序数据的案例,并证明我们能够可靠地找回早期使用不太特异和灵敏的方法识别出的选择清除。