Suppr超能文献

PharmKG:用于生物医学数据挖掘的专用知识图谱基准

PharmKG: a dedicated knowledge graph benchmark for bomedical data mining.

机构信息

School of Data and Computer Science at the Sun Yat-Sen University.

School of Systems Science and Engineering at the Sun Yat-Sen University.

出版信息

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa344.

Abstract

Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.

摘要

生物医学知识图谱(KG)有助于理解复杂的生物系统和病理学,已开始在医学实践和研究中发挥关键作用。然而,由于其复杂的性质和构建的特殊要求,其嵌入和使用仍然存在挑战。现有研究通常存在数据集稀疏且嘈杂、建模方法不足以及评估指标不一致等问题。在这项工作中,我们建立了一个全面的生物医学领域的 KG 系统,试图弥合这一差距。在这里,我们引入了 PharmKG,这是一个多关系、有属性的生物医学 KG,由超过 50 万个个体基因、药物和疾病之间的相互联系组成,词汇量约为 8000 个已消歧实体,具有 29 种关系类型。PharmKG 中的每个实体都附有来自多组学数据的异构、特定于领域的信息,即基因表达、化学结构和疾病词嵌入,同时保留语义和生物医学特征。对于基线,我们提供了九种最先进的 KG 嵌入(KGE)方法和一种新的生物学、直观的基于图神经网络的 KGE 方法,该方法结合了全局网络结构和异构领域特征。基于提出的基准,我们使用多种评估指标对这些 KGE 模型进行了广泛的实验评估。最后,我们讨论了在各种下游生物学任务中的观察结果,并为如何在生物医学中使用 KG 提供了见解和指导。我们希望 PharmKG 前所未有的质量和多样性将推动生物医学 KG 的构建、嵌入和应用的发展。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验