Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
导致蛋白质编码基因失活的遗传变异是了解基因功能丧失所导致表型后果的重要信息来源:对于生物体的功能至关重要的基因,在自然种群中会缺乏此类变异;而非必需基因则可以容忍它们的积累。然而,预测的功能丧失变异易出现注释错误,且通常以极低的频率出现,因此需要对其进行仔细的变异注释和非常大的样本量分析。在这里,我们将来自人类测序研究的 125748 个外显子组和 15708 个基因组聚合到基因组聚合数据库(gnomAD)中。在过滤掉由测序和注释错误引起的假象后,我们在该队列中鉴定出了 443769 个高可信度的预测功能丧失变异。使用改良的人类突变率模型,我们对人类蛋白质编码基因进行分类,以代表对失活的容忍程度,使用来自模型生物和工程化人类细胞的数据对该分类进行验证,并表明它可用于提高常见和罕见疾病的基因发现能力。
Transl Res. 2011-8-31
Nature. 2015-10-1
Nature. 2020-5
Gigascience. 2025-1-6
Front Pediatr. 2025-8-14
Clin Case Rep. 2025-8-27
Nat Med. 2020-5-27
Nat Commun. 2020-5-27
Nat Commun. 2019-9-6
Am J Hum Genet. 2019-9-5
Nat Genet. 2019-4-8