Jacobs Guy S, Sluckin Tim J, Kivisild Toomas
Mathematical Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom Complexity Institute, Nanyang Technological University, Singapore 637723
Mathematical Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom.
Genetics. 2016 Aug;203(4):1807-25. doi: 10.1534/genetics.115.185900.
During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly's [Formula: see text] and [Formula: see text]) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly's [Formula: see text] offers high power, but is outperformed by a novel statistic that we test, which we call [Formula: see text] We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics-[Formula: see text] Kelly's [Formula: see text] and [Formula: see text]-are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. While [Formula: see text] replicates most candidates when recombination map data are not available, the [Formula: see text] and [Formula: see text] statistics are more successful when recombination rate variation is controlled for. Given both this and their higher power in simulations of selective sweeps, these statistics are preferred when information on local recombination rate variation is available.
在选择性清除过程中,所选基因座周围的基因组区域会出现连锁不平衡的特征模式。这些模式已被用于推断过去的选择性清除。然而,已知许多物种的重组率在基因组中会有很大差异。我们在此研究当前(凯利的[公式:见正文]和[公式:见正文])以及新统计方法在基于不同条件下的连锁不平衡畸变推断硬选择性清除方面的有效性,这些条件包括符合人类实际情况的人口统计模型和重组率变化。当重组率恒定时,凯利的[公式:见正文]具有较高的功效,但我们测试的一种新统计方法表现更优,我们将其称为[公式:见正文]。我们还发现该统计方法在检测来自现有变异的选择性清除方面很有效。当考虑重组率波动时,所有基于连锁不平衡的统计方法的功效都会大幅降低。然而,通过使用遗传图谱适当控制预期的连锁不平衡,这种情况在很大程度上可以得到扭转。为了进一步测试这些不同的方法,我们对特征明确的HapMap数据进行了选择扫描,发现所有三种统计方法——[公式:见正文]、凯利的[公式:见正文]和[公式:见正文]——都能够在先前基于群体分化或位点频率谱被确定为选择候选区域的地方复制信号。当没有重组图谱数据时,[公式:见正文]能复制大多数候选区域,而当控制了重组率变化时,[公式:见正文]和[公式:见正文]统计方法更成功。鉴于此以及它们在选择性清除模拟中的更高功效,当有关于局部重组率变化的信息时,这些统计方法更受青睐。