Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
Nat Genet. 2022 Jun;54(6):827-836. doi: 10.1038/s41588-022-01087-y. Epub 2022 Jun 6.
Disease-associated single-nucleotide polymorphisms (SNPs) generally do not implicate target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis. Here, we developed a heritability-based framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk. Our optimal combined S2G strategy (cS2G) included seven constituent S2G strategies and achieved a precision of 0.75 and a recall of 0.33, more than doubling the recall of any individual strategy. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 5,095 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. We further applied cS2G to provide an empirical assessment of disease omnigenicity; we determined that the top 1% of genes explained roughly half of the SNP heritability linked to all genes and that gene-level architectures vary with variant allele frequency.
疾病相关的单核苷酸多态性(SNP)通常不会涉及靶基因,因为大多数疾病 SNP 都是调节性的。已经开发了许多 SNP 到基因(S2G)连接策略,以将调节性 SNP 与它们在顺式中调节的基因联系起来。在这里,我们开发了一个基于遗传度的框架,用于评估和组合不同的 S2G 策略,以优化它们对常见疾病风险的信息量。我们的最优组合 S2G 策略(cS2G)包括七种组成 S2G 策略,其精度为 0.75,召回率为 0.33,比任何单个策略的召回率都提高了一倍以上。我们将 cS2G 应用于 49 项英国生物库疾病/特征的精细映射结果,以预测具有高可信度的 5095 个因果 SNP-基因-疾病三联体(具有 S2G 衍生的功能解释)。我们进一步应用 cS2G 对疾病的全基因组性进行实证评估;我们确定,前 1%的基因解释了大约一半与所有基因相关的 SNP 遗传度,并且基因水平的结构随变体等位基因频率而变化。