Department of Genetics, Yale School of Medicine, New Haven, CT, USA.
Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Victoria, Australia.
Nature. 2024 Nov;635(8038):390-397. doi: 10.1038/s41586-024-08048-x. Epub 2024 Oct 16.
Mitochondrial DNA (mtDNA) has an important yet often overlooked role in health and disease. Constraint models quantify the removal of deleterious variation from the population by selection and represent powerful tools for identifying genetic variation that underlies human phenotypes. However, nuclear constraint models are not applicable to mtDNA, owing to its distinct features. Here we describe the development of a mitochondrial genome constraint model and its application to the Genome Aggregation Database (gnomAD), a large-scale population dataset that reports mtDNA variation across 56,434 human participants. Specifically, we analyse constraint by comparing the observed variation in gnomAD to that expected under neutrality, which was calculated using a mtDNA mutational model and observed maximum heteroplasmy-level data. Our results highlight strong depletion of expected variation, which suggests that many deleterious mtDNA variants remain undetected. To aid their discovery, we compute constraint metrics for every mitochondrial protein, tRNA and rRNA gene, which revealed a range of intolerance to variation. We further characterize the most constrained regions within genes through regional constraint and identify the most constrained sites within the entire mitochondrial genome through local constraint, which showed enrichment of pathogenic variation. Constraint also clustered in three-dimensional structures, which provided insight into functionally important domains and their disease relevance. Notably, we identify constraint at often overlooked sites, including in rRNA and noncoding regions. Last, we demonstrate that these metrics can improve the discovery of deleterious variation that underlies rare and common phenotypes.
线粒体 DNA(mtDNA)在健康和疾病中具有重要但常被忽视的作用。约束模型通过选择量化了从群体中去除有害变异的程度,是识别导致人类表型的遗传变异的有力工具。然而,由于 mtDNA 的独特特征,核约束模型不适用于 mtDNA。在这里,我们描述了一种线粒体基因组约束模型的开发及其在大规模人群数据集 Genome Aggregation Database(gnomAD)中的应用,该数据集报告了 56434 名人类参与者的 mtDNA 变异。具体来说,我们通过将 gnomAD 中的观察到的变异与中性条件下预期的变异进行比较来分析约束,这是通过使用 mtDNA 突变模型和观察到的最大异质性水平数据计算得出的。我们的结果突出了预期变异的大量消耗,这表明许多有害的 mtDNA 变体仍然未被发现。为了帮助发现这些变体,我们为每个线粒体蛋白、tRNA 和 rRNA 基因计算了约束指标,这些指标显示出对变异的不同容忍程度。我们进一步通过区域约束来描述基因内最受约束的区域,并通过局部约束来确定整个线粒体基因组内最受约束的位点,这显示出致病性变异的富集。约束也在三维结构中聚集,这提供了对功能重要域及其疾病相关性的深入了解。值得注意的是,我们在 rRNA 和非编码区域等经常被忽视的位点发现了约束。最后,我们证明这些指标可以提高对稀有和常见表型的基础有害变异的发现。