Guo Michael H, Francioli Laurent C, Stenton Sarah L, Goodrich Julia K, Watts Nicholas A, Singer-Berk Moriel, Groopman Emily, Darnowsky Philip W, Solomonson Matthew, Baxter Samantha, Tiao Grace, Neale Benjamin M, Hirschhorn Joel N, Rehm Heidi L, Daly Mark J, O'Donnell-Luria Anne, Karczewski Konrad J, MacArthur Daniel G, Samocha Kaitlin E
Department of Neurology, Hospital of the University of the Pennsylvania, Philadelphia, PA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
bioRxiv. 2023 Aug 21:2023.03.19.533370. doi: 10.1101/2023.03.19.533370.
Recessive diseases arise when both the maternal and the paternal copies of a gene are impacted by a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants in a gene for a given disorder, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in ) rather than on the same copy (i.e. in ). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase can be determined by transmission, our approach estimates phase with 95.7% accuracy and remains accurate even for very rare variants (allele frequency < 1×10). We also correctly phase 95.9% of variant pairs in a set of 293 patients with Mendelian conditions carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in , that can aid interpretation of rare co-occurring variants in the context of recessive disease.
当一个基因的母本和父本拷贝在受影响个体中均受到有害基因变异的影响时,隐性疾病就会出现。当患者在给定疾病的一个基因中携带两个不同的潜在致病变异时,准确的诊断需要确定这两个变异出现在染色体的不同拷贝上(即处于杂合状态),而不是在同一拷贝上(即处于纯合状态)。然而,除了亲代检测之外,目前确定相位的方法在临床环境中受到限制。我们开发了一种策略,用于推断基因内罕见变异对的相位,利用从基因组聚合数据库(gnomAD v2,n = 125,748)的外显子组测序数据中观察到的基因型。当应用于可以通过传递确定相位的三联体数据时,我们的方法估计相位的准确率为95.7%,即使对于非常罕见的变异(等位基因频率<1×10)也仍然准确。我们还正确地确定了一组293名患有孟德尔疾病且携带假定致病复合杂合变异的患者中95.9%的变异对的相位。我们提供了一个来自gnomAD的相位估计公共资源,包括全基因组编码变异的相位估计以及杂合状态下每个基因的罕见变异计数,这有助于在隐性疾病背景下解释罕见的共现变异。