Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA.
Nat Commun. 2020 May 27;11(1):2539. doi: 10.1038/s41467-019-12438-5.
Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.
多核苷酸变异体(MNVs)定义为个体同一单倍型上存在的两个或更多个相邻变异体,是一类具有临床和生物学重要性的遗传变异。然而,现有的工具通常不能准确地对 MNVs 进行分类,并且对其突变起源的理解仍然有限。在这里,我们系统地研究了来自基因组聚集数据库(gnomAD)的 125748 个全外显子和 15708 个全基因组中的 MNVs。我们在整个基因组中鉴定出了 1792248 个 MNVs,其组成变体彼此之间的距离在 2bp 以内,其中包括 18756 个具有新型蛋白序列综合效应的变体。最后,我们估计了已知突变机制——CpG 脱氨酶、聚合酶 zeta 的复制错误和重复连接处的聚合酶滑动——对 MNV 产生的相对影响。我们的结果表明了单倍型感知变异体注释的价值,并深化了我们对 MNV 全基因组突变机制的理解。