Gema Aryo Pradipta, Grabarczyk Dominik, De Wulf Wolf, Borole Piyush, Alfaro Javier Antonio, Minervini Pasquale, Vergari Antonio, Rajan Ajitha
School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, United Kingdom.
International Centre for Cancer Vaccine Science, University of Gdańsk, Gdańsk 80-822, Poland.
Bioinform Adv. 2024 Jul 17;4(1):vbae097. doi: 10.1093/bioadv/vbae097. eCollection 2024.
Knowledge graphs (KGs) are powerful tools for representing and organizing complex biomedical data. They empower researchers, physicians, and scientists by facilitating rapid access to biomedical information, enabling the discernment of patterns or insights, and fostering the formulation of decisions and the generation of novel knowledge. To automate these activities, several KG embedding algorithms have been proposed to learn from and complete KGs. However, the efficacy of these embedding algorithms appears limited when applied to biomedical KGs, prompting questions about whether they can be useful in this field. To that end, we explore several widely used KG embedding models and evaluate their performance and applications using a recent biomedical KG, BioKG. We also demonstrate that by using recent best practices for training KG embeddings, it is possible to improve performance over BioKG. Additionally, we address interpretability concerns that naturally arise with such machine learning methods. In particular, we examine rule-based methods that aim to address these concerns by making interpretable predictions using learned rules, achieving comparable performance. Finally, we discuss a realistic use case where a pretrained BioKG embedding is further trained for a specific task, in this case, four polypharmacy scenarios where the goal is to predict missing links or entities in another downstream KGs in four polypharmacy scenarios. We conclude that in the right scenarios, biomedical KG embeddings can be effective and useful.
Our code and data is available at https://github.com/aryopg/biokge.
知识图谱(KGs)是用于表示和组织复杂生物医学数据的强大工具。它们通过促进对生物医学信息的快速访问、使模式或见解的辨别成为可能以及促进决策的制定和新知识的产生,为研究人员、医生和科学家提供支持。为了使这些活动自动化,已经提出了几种知识图谱嵌入算法来从知识图谱中学习并完成知识图谱。然而,当应用于生物医学知识图谱时,这些嵌入算法似乎效果有限,这引发了关于它们在该领域是否有用的疑问。为此,我们探索了几种广泛使用的知识图谱嵌入模型,并使用最近的生物医学知识图谱BioKG评估它们的性能和应用。我们还证明,通过使用训练知识图谱嵌入的最新最佳实践,可以提高在BioKG上的性能。此外,我们解决了此类机器学习方法自然产生的可解释性问题。特别是,我们研究了基于规则的方法,这些方法旨在通过使用学习到的规则进行可解释的预测来解决这些问题,并取得了可比的性能。最后,我们讨论了一个实际用例,其中预训练的BioKG嵌入针对特定任务进行进一步训练,在这种情况下,是四种多药合用场景,目标是预测四种多药合用场景中另一个下游知识图谱中缺失的链接或实体。我们得出结论,在合适的场景中,生物医学知识图谱嵌入可以是有效且有用的。
我们的代码和数据可在https://github.com/aryopg/biokge获取。