Department of Genetics and Evolutionary Biology, University of São Paulo, São Paulo, Brazil.
Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
Genome Biol Evol. 2018 Mar 1;10(3):939-955. doi: 10.1093/gbe/evy054.
Balancing selection maintains advantageous diversity in populations through various mechanisms. Although extensively explored from a theoretical perspective, an empirical understanding of its prevalence and targets lags behind our knowledge of positive selection. Here, we describe the Non-central Deviation (NCD), a simple yet powerful statistic to detect long-term balancing selection (LTBS) that quantifies how close frequencies are to expectations under LTBS, and provides the basis for a neutrality test. NCD can be applied to a single locus or genomic data, and can be implemented considering only polymorphisms (NCD1) or also considering fixed differences with respect to an outgroup (NCD2) species. Incorporating fixed differences improves power, and NCD2 has higher power to detect LTBS in humans under different frequencies of the balanced allele(s) than other available methods. Applied to genome-wide data from African and European human populations, in both cases using chimpanzee as an outgroup, NCD2 shows that, albeit not prevalent, LTBS affects a sizable portion of the genome: ∼0.6% of analyzed genomic windows and 0.8% of analyzed positions. Significant windows (P < 0.0001) contain 1.6% of SNPs in the genome, which disproportionally fall within exons and change protein sequence, but are not enriched in putatively regulatory sites. These windows overlap ∼8% of the protein-coding genes, and these have larger number of transcripts than expected by chance even after controlling for gene length. Our catalog includes known targets of LTBS but a majority of them (90%) are novel. As expected, immune-related genes are among those with the strongest signatures, although most candidates are involved in other biological functions, suggesting that LTBS potentially influences diverse human phenotypes.
平衡选择通过各种机制在种群中维持有利的多样性。尽管从理论角度已经广泛研究,但对其普遍性和目标的实证理解落后于我们对正选择的了解。在这里,我们描述了非中心偏差(NCD),这是一种简单而强大的统计量,可以检测长期平衡选择(LTBS),它量化了在 LTBS 下频率与预期的接近程度,并为中性测试提供了基础。NCD 可以应用于单个基因座或基因组数据,并且可以在仅考虑多态性(NCD1)或也考虑相对于一个外群(NCD2)物种的固定差异的情况下实施。纳入固定差异可提高功率,并且 NCD2 比其他可用方法在人类中检测平衡等位基因频率不同时的 LTBS 具有更高的功率。应用于来自非洲和欧洲人类群体的全基因组数据,在两种情况下均使用黑猩猩作为外群,NCD2 表明,尽管不普遍,但 LTBS 影响了基因组的相当大一部分:分析的基因组窗口的约 0.6%和分析的位置的约 0.8%。显著的窗口(P < 0.0001)包含基因组中 1.6%的 SNP,不成比例地位于外显子内并改变蛋白质序列,但不在假定的调控位点中富集。这些窗口重叠约 8%的蛋白质编码基因,并且这些基因的转录本数量比预期的随机数量多,即使在控制基因长度后也是如此。我们的目录包括 LTBS 的已知目标,但大多数(90%)是新的。不出所料,与免疫相关的基因是其中具有最强特征的基因之一,尽管大多数候选基因涉及其他生物学功能,这表明 LTBS 可能会影响多种人类表型。