Mousavi Zeinab, Arvanitis Marios, Duong ThuyVy, Brody Jennifer A, Battle Alexis, Sotoodehnia Nona, Shojaie Ali, Arking Dan E, Bader Joel S
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.
Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America.
PLoS Comput Biol. 2025 Jan 7;21(1):e1012725. doi: 10.1371/journal.pcbi.1012725. eCollection 2025 Jan.
Genome-wide association studies (GWAS) have identified genetic variants, usually single-nucleotide polymorphisms (SNPs), associated with human traits, including disease and disease risk. These variants (or causal variants in linkage disequilibrium with them) usually affect the regulation or function of a nearby gene. A GWAS locus can span many genes, however, and prioritizing which gene or genes in a locus are most likely to be causal remains a challenge. Better prioritization and prediction of causal genes could reveal disease mechanisms and suggest interventions.
We describe a new Bayesian method, termed SigNet for significance networks, that combines information both within and across loci to identify the most likely causal gene at each locus. The SigNet method builds on existing methods that focus on individual loci with evidence from gene distance and expression quantitative trait loci (eQTL) by sharing information across loci using protein-protein and gene regulatory interaction network data. In an application to cardiac electrophysiology with 226 GWAS loci, only 46 (20%) have within-locus evidence from Mendelian genes, protein-coding changes, or colocalization with eQTL signals. At the remaining 180 loci lacking functional information, SigNet selects 56 genes other than the minimum distance gene, equal to 31% of the information-poor loci and 25% of the GWAS loci overall. Assessment by pathway enrichment demonstrates improved performance by SigNet. Review of individual loci shows literature evidence for genes selected by SigNet, including PMP22 as a novel causal gene candidate.
全基因组关联研究(GWAS)已鉴定出与人类性状(包括疾病和疾病风险)相关的遗传变异,通常是单核苷酸多态性(SNP)。这些变异(或与其处于连锁不平衡状态的因果变异)通常会影响附近基因的调控或功能。然而,一个GWAS位点可能涵盖多个基因,确定该位点中哪个基因或哪些基因最有可能是因果基因仍然是一项挑战。更好地对因果基因进行优先级排序和预测可以揭示疾病机制并提出干预措施。
我们描述了一种新的贝叶斯方法,称为用于显著性网络的SigNet,它结合了位点内部和跨位点的信息,以识别每个位点最有可能的因果基因。SigNet方法建立在现有方法的基础上,这些方法通过使用蛋白质-蛋白质和基因调控相互作用网络数据跨位点共享信息,重点关注具有基因距离和表达数量性状位点(eQTL)证据的单个位点。在一项针对226个GWAS位点的心脏电生理学应用中,只有46个(20%)位点有来自孟德尔基因、蛋白质编码变化或与eQTL信号共定位的位点内证据。在其余180个缺乏功能信息的位点中,SigNet选择了除最小距离基因之外的56个基因,相当于信息匮乏位点的31%和总体GWAS位点的25%。通过通路富集评估表明SigNet的性能有所提高。对单个位点的审查显示了SigNet选择的基因的文献证据,包括PMP22作为一种新的因果基因候选。