基于语义图和异构信息网络上的函数相似性表示学习的药物-疾病关联预测

Drug-disease association prediction using semantic graph and function similarity representation learning over heterogeneous information networks.

作者信息

Zhao Bo-Wei, Su Xiao-Rui, Yang Yue, Li Dong-Xu, Li Guo-Dong, Hu Peng-Wei, Zhao Yong-Gang, Hu Lun

机构信息

The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.

Department of Orthopaedic Surgery (hand and foot trauma), People's Hospital of Dongxihu, Wuhan 420100, China.

出版信息

Methods. 2023 Dec;220:106-114. doi: 10.1016/j.ymeth.2023.10.014. Epub 2023 Nov 14.

DOI:10.1016/j.ymeth.2023.10.014

PMID:37972913

Abstract

Discovering new indications for existing drugs is a promising development strategy at various stages of drug research and development. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering available higher-order connectivity patterns in heterogeneous biological information networks, which are believed to be useful for improving the accuracy of new drug discovering. To this end, we propose a computational-based model, called SFRLDDA, for drug-disease association prediction by using semantic graph and function similarity representation learning. Specifically, SFRLDDA first integrates a heterogeneous information network (HIN) by drug-disease, drug-protein, protein-disease associations, and their biological knowledge. Second, different representation learning strategies are applied to obtain the feature representations of drugs and diseases from different perspectives over semantic graph and function similarity graphs constructed, respectively. At last, a Random Forest classifier is incorporated by SFRLDDA to discover potential drug-disease associations (DDAs). Experimental results demonstrate that SFRLDDA yields a best performance when compared with other state-of-the-art models on three benchmark datasets. Moreover, case studies also indicate that the simultaneous consideration of semantic graph and function similarity of drugs and diseases in the HIN allows SFRLDDA to precisely predict DDAs in a more comprehensive manner.

摘要

发现现有药物的新适应症是药物研发各个阶段中一种很有前景的发展策略。然而，它们中的大多数通过构建各种异质网络来完成任务，却没有考虑异质生物信息网络中可用的高阶连通性模式，而这些模式被认为有助于提高新药发现的准确性。为此，我们提出了一种基于计算的模型，称为SFRLDDA，用于通过语义图和功能相似性表示学习进行药物-疾病关联预测。具体而言，SFRLDDA首先通过药物-疾病、药物-蛋白质、蛋白质-疾病关联及其生物学知识整合异质信息网络（HIN）。其次，应用不同的表示学习策略，分别从语义图和构建的功能相似性图的不同角度获取药物和疾病的特征表示。最后，SFRLDDA引入随机森林分类器来发现潜在的药物-疾病关联（DDA）。实验结果表明，在三个基准数据集上，与其他现有最先进模型相比，SFRLDDA具有最佳性能。此外，案例研究还表明，在HIN中同时考虑药物和疾病的语义图和功能相似性，使SFRLDDA能够以更全面的方式精确预测DDA。