Suppr超能文献

阿尔茨海默病知识图谱增强知识发现与疾病预测。

Alzheimer's Disease Knowledge Graph Enhances Knowledge Discovery and Disease Prediction.

作者信息

Yang Yue, Yu Kaixian, Gao Shan, Yu Sheng, Xiong Di, Qin Chuanyang, Chen Huiyuan, Tang Jiarui, Tang Niansheng, Zhu Hongtu

机构信息

Department of Biostatistics, University of North Carolina at Chapel Hill.

Independent Researcher, Shanghai, P.R. China.

出版信息

bioRxiv. 2024 Jul 5:2024.07.03.601339. doi: 10.1101/2024.07.03.601339.

Abstract

BACKGROUND

Alzheimer's disease (AD), a progressive neurodegenerative disorder, continues to increase in prevalence without any effective treatments to date. In this context, knowledge graphs (KGs) have emerged as a pivotal tool in biomedical research, offering new perspectives on drug repurposing and biomarker discovery by analyzing intricate network structures. Our study seeks to build an AD-specific knowledge graph, highlighting interactions among AD, genes, variants, chemicals, drugs, and other diseases. The goal is to shed light on existing treatments, potential targets, and diagnostic methods for AD, thereby aiding in drug repurposing and the identification of biomarkers.

RESULTS

We annotated 800 PubMed abstracts and leveraged GPT-4 for text augmentation to enrich our training data for named entity recognition (NER) and relation classification. A comprehensive data mining model, integrating NER and relationship classification, was trained on the annotated corpus. This model was subsequently applied to extract relation triplets from unannotated abstracts. To enhance entity linking, we utilized a suite of reference biomedical databases and refine the linking accuracy through abbreviation resolution. As a result, we successfully identified 3,199,276 entity mentions and 633,733 triplets, elucidating connections between 5,000 unique entities. These connections were pivotal in constructing a comprehensive Alzheimer's Disease Knowledge Graph (ADKG). We also integrated the ADKG constructed after entity linking with other biomedical databases. The ADKG served as a training ground for Knowledge Graph Embedding models with the high-ranking predicted triplets supported by evidence, underscoring the utility of ADKG in generating testable scientific hypotheses. Further application of ADKG in predictive modeling using the UK Biobank data revealed models based on ADKG outperforming others, as evidenced by higher values in the areas under the receiver operating characteristic (ROC) curves.

CONCLUSION

The ADKG is a valuable resource for generating hypotheses and enhancing predictive models, highlighting its potential to advance AD's disease research and treatment strategies.

摘要

背景

阿尔茨海默病(AD)是一种进行性神经退行性疾病,其患病率持续上升,迄今为止尚无任何有效治疗方法。在此背景下,知识图谱(KGs)已成为生物医学研究中的关键工具,通过分析复杂的网络结构,为药物再利用和生物标志物发现提供了新视角。我们的研究旨在构建一个特定于AD的知识图谱,突出AD、基因、变体、化学物质、药物和其他疾病之间的相互作用。目标是阐明AD的现有治疗方法、潜在靶点和诊断方法,从而有助于药物再利用和生物标志物的识别。

结果

我们注释了800篇PubMed摘要,并利用GPT-4进行文本扩充,以丰富我们用于命名实体识别(NER)和关系分类的训练数据。在注释语料库上训练了一个综合数据挖掘模型,该模型整合了NER和关系分类。随后,该模型被应用于从未注释的摘要中提取关系三元组。为了增强实体链接,我们利用了一套参考生物医学数据库,并通过缩写解析提高链接准确性。结果,我们成功识别了3199276个实体提及和633733个三元组,阐明了5000个独特实体之间的联系。这些联系对于构建全面的阿尔茨海默病知识图谱(ADKG)至关重要。我们还将实体链接后构建的ADKG与其他生物医学数据库进行了整合。ADKG作为知识图谱嵌入模型的训练平台,其预测的三元组排名靠前且有证据支持,突出了ADKG在生成可测试科学假设方面的效用。ADKG在使用英国生物银行数据进行预测建模中的进一步应用表明,基于ADKG的模型优于其他模型,这在受试者操作特征(ROC)曲线下面积的值更高中得到了证明。

结论

ADKG是生成假设和增强预测模型的宝贵资源,突出了其推进AD疾病研究和治疗策略的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3792/11245034/08237b554a25/nihpp-2024.07.03.601339v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验