Department of Biology, Pennsylvania State University, University Park, PA.
Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA.
Mol Biol Evol. 2020 Oct 1;37(10):3023-3046. doi: 10.1093/molbev/msaa115.
Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.
选择压力是自然种群基因组中频繁而多样的特征,因此,检测选择压力对于理解自然选择适应机制非常重要。在选择压力之后,选择位点周围的单倍型多样性会减少,这种偏离背景变异模式的情况可以用来识别选择压力。有多种方法可以从单倍型数据中定位基因组中的选择压力,但没有一种方法利用基于模型的方法来进行推断。在这里,我们提出了一种似然比检验统计量 T,用于探测全基因组多态性数据集的选择压力特征。我们的框架使用了一种简单但强大的单倍型频率谱扭曲模型,用于发现选择压力,并对群体中目前正在发生的选择单倍型数量进行推断。我们发现,T 统计量适用于检测各种人口模型、选择强度和有利等位基因年龄下的硬选择和软选择。因此,我们将 T 统计量应用于欧洲和撒哈拉以南非洲人类群体的变异呼叫,主要得到了文献支持的候选基因,包括 CEU 中的 LCT、RSPH3 和 ZNF211,YRI 中的 SYT1、RGS18 和 NNT,以及两个群体中的 HLA 基因。我们还在黑腹果蝇中搜索了选择压力特征,在 Ace、Uhg1 和 Pimet 处找到了预期的候选基因。最后,我们提供了开源软件,用于从全基因组数据计算 T 统计量和推断目前正在发生的选择单倍型数量。