DRIVE-KG：利用异构知识图谱增强对研究不足的复杂疾病中变异-表型关联的发现。

DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs.

作者信息

Rajagopalan Ananya, Nguyen Tram Anh, Guare Lindsay A, Garao Rico Andre Luis, Venkatesh Rasika, Caruth Lannawill, Verma Anurag, Ritchie Marylyn D, Hall Molly A, Romano Joseph D, Setia-Verma Shefali

机构信息

Genomics and Computational Biology Graduate Program.

Department of Genetics.

出版信息

medRxiv. 2025 Aug 21:2025.08.19.25333942. doi: 10.1101/2025.08.19.25333942.

DOI:10.1101/2025.08.19.25333942

PMID:40894144

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12393615/

Abstract

Multi-omics data are instrumental in obtaining a comprehensive picture of complex biological systems. This is particularly useful for women's health conditions, such as endometriosis which has been historically understudied despite having a high prevalence (around 10% of women of reproductive age). Subsequently, endometriosis has limited genetic characterization: current genome-wide association studies explain only 11% of its 47% total estimated heritability. Graph representations provide an intuitive and meaningful way to relate concepts across diverse data sources and address fundamental sparsity and dimensionality challenges with multi-omics data analysis. Here we present DRIVE-KG (Disease Risk Inference and Variant Exploration-Knowledge Graph), which uses a heterogeneous graph representation to integrate biological data from multi-omics datasets: dbSNP, NCBI Human Gene, Omics Pred, GTEx, and Open Targets. We drew directly from the knowledge captured in these data, using nodes to represent genes, single nucleotide polymorphisms, proteins, and phenotypes, and edges to represent relationships between these concepts. We trained two models using DRIVE-KG: a link prediction model to suggest associations between SNPs and two pilot phenotypes (endometriosis and obesity), and a graph convolutional network (GCN) to classify patient-level endometriosis status. We conducted the patient-level classification using data from 1,441 Penn Medicine BioBank participants with gold standard chart-reviewed endometriosis status. The link prediction model uncovered 66 high-confidence (score ≥ 0.95) previously unreported SNP-endometriosis associations. Many of these variants were linked to obesity/body mass index traits (24.2%), lipid metabolism (6%), and depressive disorders (4.5%), showing agreement with emerging hypotheses about endometriosis etiology. In contrast, 11% of the 149 high confidence, candidate SNP-obesity associations (score ≥ 0.9888) were in LD with known obesity associations. The GCN to classify patient endometriosis status had an AUPRC of 0.738 compared to 0.679 for a genetic risk score. Despite this moderate improvement, we found that the GCN learned meaningful stratification of underlying adenomyosis signal and severe grades of endometriosis. We have demonstrated that heterogeneous integration of multi-omics data is valuable for diverse downstream tasks-including discovery and clinical prediction-particularly for understudied diseases where traditional genomic approaches are insufficient.

摘要

多组学数据有助于全面了解复杂的生物系统。这对于女性健康状况尤为有用，例如子宫内膜异位症，尽管其患病率很高（约占育龄女性的10%），但历来研究不足。随后，子宫内膜异位症的遗传特征有限：目前的全基因组关联研究仅解释了其估计总遗传力的47%中的11%。图形表示提供了一种直观且有意义的方式来关联来自不同数据源的概念，并解决多组学数据分析中的基本稀疏性和维度挑战。在这里，我们展示了DRIVE-KG（疾病风险推断和变异探索-知识图谱），它使用异构图表示来整合来自多组学数据集的生物数据：dbSNP、NCBI人类基因、Omics Pred、GTEx和Open Targets。我们直接从这些数据中获取知识，使用节点表示基因、单核苷酸多态性、蛋白质和表型，使用边表示这些概念之间的关系。我们使用DRIVE-KG训练了两个模型：一个链接预测模型，用于建议单核苷酸多态性与两种试点表型（子宫内膜异位症和肥胖症）之间的关联；一个图卷积网络（GCN），用于对患者水平的子宫内膜异位症状态进行分类。我们使用来自1441名宾夕法尼亚大学医学银行参与者的数据进行患者水平的分类，这些参与者具有经过金标准图表审查的子宫内膜异位症状态。链接预测模型发现了66个高置信度（得分≥0.95）的先前未报告的单核苷酸多态性-子宫内膜异位症关联。其中许多变异与肥胖/体重指数特征（24.2%）、脂质代谢（6%）和抑郁症（4.5%）相关，这与关于子宫内膜异位症病因的新假设一致。相比之下，149个高置信度的候选单核苷酸多态性-肥胖症关联（得分≥0.9888）中有11%与已知的肥胖症关联处于连锁不平衡状态。用于对患者子宫内膜异位症状态进行分类的GCN的AUPRC为0.738，而遗传风险评分的AUPRC为0.679。尽管有这种适度的改进，但我们发现GCN学习到了子宫腺肌病潜在信号和严重程度的子宫内膜异位症的有意义分层。我们已经证明，多组学数据的异质整合对于各种下游任务（包括发现和临床预测）是有价值的，特别是对于传统基因组方法不足的研究较少的疾病。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b30f/12393615/93402ccf541a/nihpp-2025.08.19.25333942v1-f0001.jpg

相似文献

DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs.DRIVE-KG：利用异构知识图谱增强对研究不足的复杂疾病中变异-表型关联的发现。

medRxiv. 2025 Aug 21:2025.08.19.25333942. doi: 10.1101/2025.08.19.25333942.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Identification and Validation of Novel Combinatorial Genetic Risk Factors for Endometriosis across Multiple UK and US Patient Cohorts.英国和美国多个患者队列中子宫内膜异位症新型组合遗传风险因素的识别与验证

medRxiv. 2025 Aug 15:2025.08.13.25333595. doi: 10.1101/2025.08.13.25333595.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Blood biomarkers for the non-invasive diagnosis of endometriosis.用于子宫内膜异位症无创诊断的血液生物标志物。

Cochrane Database Syst Rev. 2016 May 1;2016(5):CD012179. doi: 10.1002/14651858.CD012179.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Endometrial biomarkers for the non-invasive diagnosis of endometriosis.用于子宫内膜异位症非侵入性诊断的子宫内膜生物标志物。

Cochrane Database Syst Rev. 2016 Apr 20;4(4):CD012165. doi: 10.1002/14651858.CD012165.

The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.评估胰高血糖素样肽-1受体激动剂（GLP-1 RAs）减肥效果的网状Meta分析的数量、质量及结果：一项范围综述

Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状Meta分析。

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

本文引用的文献

Polygenic prediction of body mass index and obesity through the life course and across ancestries.通过生命历程和不同血统对体重指数和肥胖进行多基因预测。

Nat Med. 2025 Jul 21. doi: 10.1038/s41591-025-03827-z.

Network-based analyses of multiomics data in biomedicine.生物医药中多组学数据的基于网络的分析。

BioData Min. 2025 May 27;18(1):37. doi: 10.1186/s13040-025-00452-x.

The phenotypic and genetic association between endometriosis and immunological diseases.子宫内膜异位症与免疫性疾病之间的表型和遗传关联。

Hum Reprod. 2025 Apr 22. doi: 10.1093/humrep/deaf062.

Genomics and multiomics in the age of precision medicine.精准医学时代的基因组学与多组学

Pediatr Res. 2025 Apr 4. doi: 10.1038/s41390-025-04021-0.

3D genomic features across >50 diverse cell types reveal insights into the genomic architecture of childhood obesity.超过50种不同细胞类型的三维基因组特征揭示了儿童肥胖症基因组结构的相关见解。

Elife. 2025 Jan 15;13:RP95411. doi: 10.7554/eLife.95411.

Correlations between endometriosis, lipid profile, and estrogen levels.子宫内膜异位症、血脂谱和雌激素水平之间的相关性。

Medicine (Baltimore). 2023 Jul 21;102(29):e34348. doi: 10.1097/MD.0000000000034348.

An atlas of genetic scores to predict multi-omic traits.遗传评分图谱预测多组学特征

Nature. 2023 Apr;616(7955):123-131. doi: 10.1038/s41586-023-05844-9. Epub 2023 Mar 29.

The genetic basis of endometriosis and comorbidity with other pain and inflammatory conditions.子宫内膜异位症及其与其他疼痛和炎症性疾病并存的遗传基础。

Nat Genet. 2023 Mar;55(3):423-436. doi: 10.1038/s41588-023-01323-z. Epub 2023 Mar 13.

Epidemiologic and Genetic Associations of Endometriosis With Depression, Anxiety, and Eating Disorders.子宫内膜异位症与抑郁、焦虑和饮食失调的流行病学和遗传学关联。

JAMA Netw Open. 2023 Jan 3;6(1):e2251214. doi: 10.1001/jamanetworkopen.2022.51214.

The Penn Medicine BioBank: Towards a Genomics-Enabled Learning Healthcare System to Accelerate Precision Medicine in a Diverse Population.宾夕法尼亚大学医学中心生物样本库：迈向一个支持基因组学的学习型医疗系统，以加速在多样化人群中推进精准医学。

J Pers Med. 2022 Nov 29;12(12):1974. doi: 10.3390/jpm12121974.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DRIVE-KG：利用异构知识图谱增强对研究不足的复杂疾病中变异-表型关联的发现。

DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献