Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
Nature. 2020 May;581(7809):444-451. doi: 10.1038/s41586-020-2287-8. Epub 2020 May 27.
Structural variants (SVs) rearrange large segments of DNA and can have profound consequences in evolution and human disease. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD) have become integral in the interpretation of single-nucleotide variants (SNVs). However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings. This SV resource is freely distributed via the gnomAD browser and will have broad utility in population genetics, disease-association studies, and diagnostic screening.
结构变异(SVs)重排了大量的 DNA 片段,在进化和人类疾病中可能产生深远的影响。随着国家生物库、疾病关联研究和临床基因测试越来越依赖于基因组测序,像基因组聚集数据库(gnomAD)这样的人群参考资料在单核苷酸变异(SNVs)的解释中变得不可或缺。然而,目前还没有与 SNVs 可比的高覆盖率基因组测序的 SV 参考图谱。在这里,我们展示了来自 gnomAD 中 14891 个不同全球人群(54%非欧洲人)的高覆盖率基因组序列中构建的 SV 参考图谱。我们发现了一个丰富而复杂的 433371 个 SV 景观,从中我们估计 SV 负责每个基因组中 25-29%的罕见蛋白质截断事件。我们发现对有害 SNVs 和破坏或复制蛋白质编码序列的罕见 SV 的自然选择之间存在很强的相关性,这表明对功能丧失高度不耐受的基因也对增加剂量敏感。我们还发现顺式调控元件中非编码 SV 存在适度的选择,但对蛋白质截断 SV 的选择强于所有非编码效应。最后,我们在 3.9%的样本中发现了非常大的(超过 1 兆碱基)、罕见的 SV,并估计 0.13%的个体可能携带符合现有临床重要偶然发现标准的 SV。这个 SV 资源通过 gnomAD 浏览器免费提供,将在群体遗传学、疾病关联研究和诊断筛查中具有广泛的应用。