Fine Adam G, Steinrücken Matthias
Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America.
Graduate Program in Biophysical Sciences, University of Chicago, Chicago, Illinois, United States of America.
PLoS Genet. 2025 Jul 22;21(7):e1011769. doi: 10.1371/journal.pgen.1011769. eCollection 2025 Jul.
Detecting and quantifying the strength of selection is a major objective in population genetics. Since selection acts over multiple generations, many approaches have been developed to detect and quantify selection using genetic data sampled at multiple points in time. Such time-series genetic data is commonly analyzed using Hidden Markov Models, but in most cases, under the assumption of additive selection. However, many examples of genetic variation exhibiting non-additive mechanisms exist, making it critical to develop methods that can characterize selection in more general scenarios. Here, we extend a previously introduced expectation-maximization algorithm for the inference of additive selection coefficients to the case of general diploid selection, in which the heterozygote and homozygote fitness are parameterized independently. We furthermore introduce a framework to identify bespoke modes of diploid selection from given data, a heuristic to account for variable population size, and a procedure for aggregating data across linked loci to increase power and robustness. Using extensive simulation studies, we find that our method accurately and efficiently estimates selection coefficients for different modes of diploid selection across a wide range of scenarios; however, power to classify the mode of selection is low unless selection is very strong. We apply our method to ancient DNA samples from Great Britain in the last 4,450 years and detect evidence for selection in six genomic regions, including the well-characterized LCT locus. Our work is the first genome-wide scan characterizing signals of general diploid selection.
检测并量化选择强度是群体遗传学的一个主要目标。由于选择作用于多个世代,人们已经开发出许多方法,利用在多个时间点采集的遗传数据来检测和量化选择。此类时间序列遗传数据通常使用隐马尔可夫模型进行分析,但在大多数情况下,是在加性选择的假设下进行的。然而,存在许多表现出非加性机制的遗传变异实例,因此开发能够在更一般情况下表征选择的方法至关重要。在这里,我们将先前引入的用于推断加性选择系数的期望最大化算法扩展到一般二倍体选择的情况,其中杂合子和纯合子的适合度是独立参数化的。我们还引入了一个框架,用于从给定数据中识别定制的二倍体选择模式、一种考虑可变种群大小的启发式方法,以及一种跨连锁位点聚合数据以提高功效和稳健性的程序。通过广泛的模拟研究,我们发现我们的方法能够在广泛的场景中准确有效地估计不同模式的二倍体选择的选择系数;然而,除非选择非常强烈,否则分类选择模式的功效较低。我们将我们的方法应用于过去4450年来自英国的古代DNA样本,并在六个基因组区域检测到选择的证据,包括特征明确的LCT基因座。我们的工作是首次在全基因组范围内扫描表征一般二倍体选择的信号。