Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802.
Molecular, Cellular, and Integrative Biosciences at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802.
Genetics. 2020 May;215(1):143-171. doi: 10.1534/genetics.120.303137. Epub 2020 Mar 9.
Positive selection causes beneficial alleles to rise to high frequency, resulting in a selective sweep of the diversity surrounding the selected sites. Accordingly, the signature of a selective sweep in an ancestral population may still remain in its descendants. Identifying signatures of selection in the ancestor that are shared among its descendants is important to contextualize the timing of a sweep, but few methods exist for this purpose. We introduce the statistic SS-H12, which can identify genomic regions under shared positive selection across populations and is based on the theory of the expected haplotype homozygosity statistic H12, which detects recent hard and soft sweeps from the presence of high-frequency haplotypes. SS-H12 is distinct from comparable statistics because it requires a minimum of only two populations, and properly identifies and differentiates between independent convergent sweeps and true ancestral sweeps, with high power and robustness to a variety of demographic models. Furthermore, we can apply SS-H12 in conjunction with the ratio of statistics we term [Formula: see text] and [Formula: see text] to further classify identified shared sweeps as hard or soft. Finally, we identified both previously reported and novel shared sweep candidates from human whole-genome sequences. Previously reported candidates include the well-characterized ancestral sweeps at and in Indo-Europeans, as well as worldwide. Novel candidates include an ancestral sweep at in sub-Saharan Africans involved in regulating the platelet response and implicated in sudden cardiac death, and a convergent sweep at between European and East Asian populations that may explain their different insulin responses.
正选择导致有利等位基因上升到高频率,导致选择位点周围的多样性发生选择性清除。因此,祖先群体中选择清除的特征可能仍然存在于其后代中。在祖先中识别与后代共享的选择特征对于确定清除的时间很重要,但为此目的存在的方法很少。我们引入了统计量 SS-H12,它可以识别跨群体共有的正选择下的基因组区域,并且基于预期单倍型纯合性统计量 H12 的理论,该理论通过高频单倍型的存在来检测最近的硬选择和软选择。SS-H12 与可比统计量不同,因为它只需要最少两个群体,并且可以正确识别和区分独立的趋同清除和真正的祖先清除,具有多种人口模型的高功效和稳健性。此外,我们可以结合我们称之为[公式:见文本]和[公式:见文本]的统计量的比率来应用 SS-H12,以进一步将识别出的共有的清除分类为硬清除或软清除。最后,我们从人类全基因组序列中鉴定了以前报道和新的共有的清除候选者。以前报道的候选者包括在印欧语系中得到很好描述的祖先清除,以及在全世界范围内的 。新的候选者包括在调节血小板反应并与心源性猝死有关的撒哈拉以南非洲人中的祖先清除,以及在欧洲和东亚人群之间的趋同清除,这可能解释了它们不同的胰岛素反应。