生物医学领域中的知识图谱嵌入：它们有用吗？审视链接预测、规则学习及下游多药治疗任务。

Knowledge graph embeddings in the biomedical domain: are they useful? A look at link prediction, rule learning, and downstream polypharmacy tasks.

作者信息

Gema Aryo Pradipta, Grabarczyk Dominik, De Wulf Wolf, Borole Piyush, Alfaro Javier Antonio, Minervini Pasquale, Vergari Antonio, Rajan Ajitha

机构信息

School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, United Kingdom.

International Centre for Cancer Vaccine Science, University of Gdańsk, Gdańsk 80-822, Poland.

出版信息

Bioinform Adv. 2024 Jul 17;4(1):vbae097. doi: 10.1093/bioadv/vbae097. eCollection 2024.

DOI:10.1093/bioadv/vbae097

PMID:39506988

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11538020/

Abstract

SUMMARY

Knowledge graphs (KGs) are powerful tools for representing and organizing complex biomedical data. They empower researchers, physicians, and scientists by facilitating rapid access to biomedical information, enabling the discernment of patterns or insights, and fostering the formulation of decisions and the generation of novel knowledge. To automate these activities, several KG embedding algorithms have been proposed to learn from and complete KGs. However, the efficacy of these embedding algorithms appears limited when applied to biomedical KGs, prompting questions about whether they can be useful in this field. To that end, we explore several widely used KG embedding models and evaluate their performance and applications using a recent biomedical KG, BioKG. We also demonstrate that by using recent best practices for training KG embeddings, it is possible to improve performance over BioKG. Additionally, we address interpretability concerns that naturally arise with such machine learning methods. In particular, we examine rule-based methods that aim to address these concerns by making interpretable predictions using learned rules, achieving comparable performance. Finally, we discuss a realistic use case where a pretrained BioKG embedding is further trained for a specific task, in this case, four polypharmacy scenarios where the goal is to predict missing links or entities in another downstream KGs in four polypharmacy scenarios. We conclude that in the right scenarios, biomedical KG embeddings can be effective and useful.

AVAILABILITY AND IMPLEMENTATION

Our code and data is available at https://github.com/aryopg/biokge.

摘要

知识图谱（KGs）是用于表示和组织复杂生物医学数据的强大工具。它们通过促进对生物医学信息的快速访问、使模式或见解的辨别成为可能以及促进决策的制定和新知识的产生，为研究人员、医生和科学家提供支持。为了使这些活动自动化，已经提出了几种知识图谱嵌入算法来从知识图谱中学习并完成知识图谱。然而，当应用于生物医学知识图谱时，这些嵌入算法似乎效果有限，这引发了关于它们在该领域是否有用的疑问。为此，我们探索了几种广泛使用的知识图谱嵌入模型，并使用最近的生物医学知识图谱BioKG评估它们的性能和应用。我们还证明，通过使用训练知识图谱嵌入的最新最佳实践，可以提高在BioKG上的性能。此外，我们解决了此类机器学习方法自然产生的可解释性问题。特别是，我们研究了基于规则的方法，这些方法旨在通过使用学习到的规则进行可解释的预测来解决这些问题，并取得了可比的性能。最后，我们讨论了一个实际用例，其中预训练的BioKG嵌入针对特定任务进行进一步训练，在这种情况下，是四种多药合用场景，目标是预测四种多药合用场景中另一个下游知识图谱中缺失的链接或实体。我们得出结论，在合适的场景中，生物医学知识图谱嵌入可以是有效且有用的。

可用性和实现方式

我们的代码和数据可在https://github.com/aryopg/biokge获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21a/11538020/00e0bb1b8e81/vbae097f1.jpg

相似文献

Knowledge graph embeddings in the biomedical domain: are they useful? A look at link prediction, rule learning, and downstream polypharmacy tasks.

Bioinform Adv. 2024 Jul 17;4(1):vbae097. doi: 10.1093/bioadv/vbae097. eCollection 2024.

BioBLP: a modular framework for learning on multimodal biomedical knowledge graphs.

J Biomed Semantics. 2023 Dec 8;14(1):20. doi: 10.1186/s13326-023-00301-y.

FuseLinker: Leveraging LLM's pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs.

J Biomed Inform. 2024 Oct;158:104730. doi: 10.1016/j.jbi.2024.104730. Epub 2024 Sep 24.

PT-KGNN: A framework for pre-training biomedical knowledge graphs with graph neural networks.

Comput Biol Med. 2024 Aug;178:108768. doi: 10.1016/j.compbiomed.2024.108768. Epub 2024 Jun 26.

Community knowledge graph abstraction for enhanced link prediction: A study on PubMed knowledge graph.

J Biomed Inform. 2024 Oct;158:104725. doi: 10.1016/j.jbi.2024.104725. Epub 2024 Sep 10.

Graph embedding on biomedical networks: methods, applications and evaluations.

Bioinformatics. 2020 Feb 15;36(4):1241-1251. doi: 10.1093/bioinformatics/btz718.

Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation.

JMIR Med Inform. 2021 Oct 25;9(10):e32730. doi: 10.2196/32730.

STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs.

Bioinformatics. 2022 Mar 4;38(6):1648-1656. doi: 10.1093/bioinformatics/btac001.

Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings.

Proc Conf Assoc Comput Linguist Meet. 2020 Jul;2020:167-176. doi: 10.18653/v1/2020.bionlp-1.18.

Ensembles of knowledge graph embedding models improve predictions for drug discovery.

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac481.

引用本文的文献

Bind: large-scale biological interaction network discovery through knowledge graph-driven machine learning.

J Transl Med. 2025 Jul 31;23(1):856. doi: 10.1186/s12967-025-06789-5.

Predicting Natural Product-Drug Interactions with Knowledge Graph Embeddings.

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:556-565. eCollection 2025.

Aggregating multimodal cancer data across unaligned embedding spaces maintains tumor of origin signal.

bioRxiv. 2025 May 18:2025.05.14.653900. doi: 10.1101/2025.05.14.653900.

本文引用的文献

Mining on Alzheimer's diseases related knowledge graph to identity potential AD-related semantic triples for drug repurposing.

BMC Bioinformatics. 2022 Sep 30;23(Suppl 6):407. doi: 10.1186/s12859-022-04934-1.

A method for the rational selection of drug repurposing candidates from multimodal knowledge harmonization.

Sci Rep. 2021 May 26;11(1):11049. doi: 10.1038/s41598-021-90296-2.

Application of network link prediction in drug discovery.

BMC Bioinformatics. 2021 Apr 12;22(1):187. doi: 10.1186/s12859-021-04082-y.

Modeling polypharmacy side effects with graph convolutional networks.

Bioinformatics. 2018 Jul 1;34(13):i457-i466. doi: 10.1093/bioinformatics/bty294.

The Amyloid Cascade Hypothesis in Alzheimer's Disease: It's Time to Change Our Mind.

Curr Neuropharmacol. 2017;15(6):926-935. doi: 10.2174/1570159X15666170116143743.

KEGG as a reference resource for gene and protein annotation.

Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62. doi: 10.1093/nar/gkv1070. Epub 2015 Oct 17.

A community computational challenge to predict the activity of pairs of compounds.

Nat Biotechnol. 2014 Dec;32(12):1213-22. doi: 10.1038/nbt.3052. Epub 2014 Nov 17.

Data-driven prediction of drug effects and interactions.

Sci Transl Med. 2012 Mar 14;4(125):125ra31. doi: 10.1126/scitranslmed.3003377.

Reactome: a database of reactions, pathways and biological processes.

Nucleic Acids Res. 2011 Jan;39(Database issue):D691-7. doi: 10.1093/nar/gkq1018. Epub 2010 Nov 9.

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored.

Nucleic Acids Res. 2011 Jan;39(Database issue):D561-8. doi: 10.1093/nar/gkq973. Epub 2010 Nov 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

生物医学领域中的知识图谱嵌入：它们有用吗？审视链接预测、规则学习及下游多药治疗任务。

Knowledge graph embeddings in the biomedical domain: are they useful? A look at link prediction, rule learning, and downstream polypharmacy tasks.

作者信息

机构信息