Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
PLoS Genet. 2013;9(2):e1003301. doi: 10.1371/journal.pgen.1003301. Epub 2013 Feb 28.
Large-scale population sequencing studies provide a complete picture of human genetic variation within the studied populations. A key challenge is to identify, among the myriad alleles, those variants that have an effect on molecular function, phenotypes, and reproductive fitness. Most non-neutral variation consists of deleterious alleles segregating at low population frequency due to incessant mutation. To date, studies characterizing selection against deleterious alleles have been based on allele frequency (testing for a relative excess of rare alleles) or ratio of polymorphism to divergence (testing for a relative increase in the number of polymorphic alleles). Here, starting from Maruyama's theoretical prediction (Maruyama T (1974), Am J Hum Genet USA 6:669-673) that a (slightly) deleterious allele is, on average, younger than a neutral allele segregating at the same frequency, we devised an approach to characterize selection based on allelic age. Unlike existing methods, it compares sets of neutral and deleterious sequence variants at the same allele frequency. When applied to human sequence data from the Genome of the Netherlands Project, our approach distinguishes low-frequency coding non-synonymous variants from synonymous and non-coding variants at the same allele frequency and discriminates between sets of variants independently predicted to be benign or damaging for protein structure and function. The results confirm the abundance of slightly deleterious coding variation in humans.
大规模的人群测序研究提供了在研究人群中人类遗传变异的全貌。一个关键的挑战是在无数的等位基因中,识别出那些对分子功能、表型和生殖适应性有影响的变体。大多数非中性变异由不断突变导致在低人群频率下分离的有害等位基因组成。迄今为止,针对有害等位基因选择的研究基于等位基因频率(检测稀有等位基因的相对过剩)或多态性与分歧的比率(检测多态性等位基因数量的相对增加)。在这里,我们从 Maruyama 的理论预测(Maruyama T (1974), Am J Hum Genet USA 6:669-673)出发,即一个(略有)有害的等位基因平均比在相同频率下分离的中性等位基因年轻,设计了一种基于等位基因年龄来描述选择的方法。与现有方法不同,它比较了在相同等位基因频率下的中性和有害序列变体的集合。当应用于来自荷兰基因组项目的人类序列数据时,我们的方法将低频编码非同义变异与同义和非编码变异区分开来,并在独立预测为对蛋白质结构和功能无害或有害的变体集合之间进行区分。结果证实了人类中轻度有害编码变异的丰富性。