Institute of Biomedical Sciences and School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China.
College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
Bioinformatics. 2022 Apr 12;38(8):2235-2245. doi: 10.1093/bioinformatics/btac085.
Knowledge Graph (KG) is becoming increasingly important in the biomedical field. Deriving new and reliable knowledge from existing knowledge by KG embedding technology is a cutting-edge method. Some add a variety of additional information to aid reasoning, namely multimodal reasoning. However, few works based on the existing biomedical KGs are focused on specific diseases.
This work develops a construction and multimodal reasoning process of Specific Disease Knowledge Graphs (SDKGs). We construct SDKG-11, a SDKG set including five cancers, six non-cancer diseases, a combined Cancer5 and a combined Diseases11, aiming to discover new reliable knowledge and provide universal pre-trained knowledge for that specific disease field. SDKG-11 is obtained through original triplet extraction, standard entity set construction, entity linking and relation linking. We implement multimodal reasoning by reverse-hyperplane projection for SDKGs based on structure, category and description embeddings. Multimodal reasoning improves pre-existing models on all SDKGs using entity prediction task as the evaluation protocol. We verify the model's reliability in discovering new knowledge by manually proofreading predicted drug-gene, gene-disease and disease-drug pairs. Using embedding results as initialization parameters for the biomolecular interaction classification, we demonstrate the universality of embedding models.
The constructed SDKG-11 and the implementation by TensorFlow are available from https://github.com/ZhuChaoY/SDKG-11.
Supplementary data are available at Bioinformatics online.
知识图谱(KG)在生物医学领域变得越来越重要。通过 KG 嵌入技术从现有知识中推导出新的、可靠的知识是一种前沿方法。有些方法添加了各种额外的信息来辅助推理,即多模态推理。然而,基于现有的生物医学 KGs 的工作很少关注特定的疾病。
本工作开发了一种特定疾病知识图谱(SDKG)的构建和多模态推理过程。我们构建了 SDKG-11,这是一个包括五种癌症、六种非癌症疾病、一个综合癌症 5 和一个综合疾病 11 的 SDKG 集,旨在发现新的可靠知识,并为特定疾病领域提供通用的预训练知识。SDKG-11 通过原始三元组提取、标准实体集构建、实体链接和关系链接获得。我们通过基于结构、类别和描述嵌入的反向超平面投影来实现 SDKG 的多模态推理。多模态推理通过实体预测任务作为评估协议,提高了所有 SDKG 上现有的模型。我们通过手动校对预测的药物-基因、基因-疾病和疾病-药物对来验证模型在发现新知识方面的可靠性。我们将嵌入结果用作生物分子相互作用分类的初始化参数,展示了嵌入模型的通用性。
构建的 SDKG-11 和基于 TensorFlow 的实现可从 https://github.com/ZhuChaoY/SDKG-11 获得。
补充数据可在生物信息学在线获得。