The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
Am J Hum Genet. 2012 Dec 7;91(6):1022-32. doi: 10.1016/j.ajhg.2012.10.015.
We have assessed the numbers of potentially deleterious variants in the genomes of apparently healthy humans by using (1) low-coverage whole-genome sequence data from 179 individuals in the 1000 Genomes Pilot Project and (2) current predictions and databases of deleterious variants. Each individual carried 281-515 missense substitutions, 40-85 of which were homozygous, predicted to be highly damaging. They also carried 40-110 variants classified by the Human Gene Mutation Database (HGMD) as disease-causing mutations (DMs), 3-24 variants in the homozygous state, and many polymorphisms putatively associated with disease. Whereas many of these DMs are likely to represent disease-allele-annotation errors, between 0 and 8 DMs (0-1 homozygous) per individual are predicted to be highly damaging, and some of them provide information of medical relevance. These analyses emphasize the need for improved annotation of disease alleles both in mutation databases and in the primary literature; some HGMD mutation data have been recategorized on the basis of the present findings, an iterative process that is both necessary and ongoing. Our estimates of deleterious-allele numbers are likely to be subject to both overcounting and undercounting. However, our current best mean estimates of ~400 damaging variants and ~2 bona fide disease mutations per individual are likely to increase rather than decrease as sequencing studies ascertain rare variants more effectively and as additional disease alleles are discovered.
(1)来自 1000 基因组计划中的 179 名个体的低覆盖率全基因组序列数据;(2)有害变异的当前预测和数据库。每个个体携带 281-515 个错义替换,其中 40-85 个为纯合子,预计高度有害。他们还携带了 40-110 种被人类基因突变数据库(HGMD)归类为致病突变(DMs)的变异,其中 3-24 种为纯合子,还有许多假定与疾病相关的多态性。虽然其中许多 DMs 可能代表疾病等位基因注释错误,但每个个体预计有 0-8 个 DM(0-1 个纯合子)高度有害,其中一些提供了与医学相关的信息。这些分析强调了需要改进突变数据库和原始文献中疾病等位基因的注释;根据本研究结果,重新分类了一些 HGMD 突变数据,这是一个必要且持续的迭代过程。我们对有害等位基因数量的估计可能存在重复计数和漏计的问题。然而,我们目前对每个人约有 400 个有害变异和约 2 个真正的疾病突变的最佳估计数可能会增加,而不是减少,因为测序研究更有效地确定稀有变异,并且发现更多的额外疾病等位基因。