Yang Yue, Yu Kaixian, Gao Shan, Yu Sheng, Xiong Di, Qin Chuanyang, Chen Huiyuan, Tang Jiarui, Tang Niansheng, Zhu Hongtu
Department of Biostatistics, University of North Carolina at Chapel Hill, USA.
Insilicom LLC, Tallahassee FL, USA.
Comput Biol Med. 2025 Apr 29;192(Pt A):110285. doi: 10.1016/j.compbiomed.2025.110285.
To construct an Alzheimer's Disease Knowledge Graph (ADKG) by extracting and integrating relationships among Alzheimer's disease (AD), genes, variants, chemicals, drugs, and other diseases from biomedical literature, aiming to identify existing treatments, potential targets, and diagnostic methods for AD.
We annotated 800 PubMed abstracts (ADERC corpus) with 20,886 entities and 4935 relationships, augmented via GPT-4. A SpERT model (SciBERT-based) trained on this data extracted relations from PubMed abstracts, supported by biomedical databases and entity linking refined via abbreviation resolution/string matching. The resulting knowledge graph trained embedding models to predict novel relationships. ADKG's utility was validated by integrating it with UK Biobank data for predictive modeling.
The ADKG contained 3,199,276 entity mentions and 633,733 triplets, linking >5K unique entities and capturing complex AD-related interactions. Its graph embedding models produced evidence-supported predictions, enabling testable hypotheses. In UK Biobank predictive modeling, ADKG-enhanced models achieved higher AUROC of 0.928 comparing to 0.903 without ADKG enhancement.
By synthesizing literature-derived insights into a computable framework, ADKG bridges molecular mechanisms to clinical phenotypes, advancing precision medicine in Alzheimer's research. Its structured data and predictive utility underscore its potential to accelerate therapeutic discovery and risk stratification.
通过从生物医学文献中提取和整合阿尔茨海默病(AD)、基因、变体、化学物质、药物及其他疾病之间的关系,构建阿尔茨海默病知识图谱(ADKG),旨在确定AD现有的治疗方法、潜在靶点和诊断方法。
我们用20886个实体和4935种关系对800篇PubMed摘要(ADERC语料库)进行注释,并通过GPT-4进行扩充。在这些数据上训练的一个基于SciBERT的SpERT模型从PubMed摘要中提取关系,由生物医学数据库提供支持,并通过缩写解析/字符串匹配对实体链接进行优化。由此产生的知识图谱训练嵌入模型以预测新的关系。通过将ADKG与英国生物银行数据集成用于预测建模,验证了其效用。
ADKG包含3199276个实体提及和633733个三元组,连接了超过5000个独特实体,并捕捉了与AD相关的复杂相互作用。其图谱嵌入模型产生了有证据支持的预测,从而能够提出可检验的假设。在英国生物银行预测建模中,与未增强ADKG的模型(AUROC为0.903)相比,增强ADKG的模型实现了更高的AUROC,达到0.928。
通过将从文献中获得的见解整合到一个可计算的框架中,ADKG将分子机制与临床表型联系起来,推动了阿尔茨海默病研究中的精准医学发展。其结构化数据和预测效用突出了它在加速治疗发现和风险分层方面的潜力。