Long Yahui, Wu Min, Liu Yong, Fang Yuan, Kwoh Chee Keong, Chen Jinmiao, Luo Jiawei, Li Xiaoli
Singapore Immunology Network (SIgN), Agency for Science, Technology and Research, Singapore, Singapore.
Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore, Singapore.
Bioinformatics. 2022 Apr 12;38(8):2254-2262. doi: 10.1093/bioinformatics/btac100.
Graphs or networks are widely utilized to model the interactions between different entities (e.g. proteins, drugs, etc.) for biomedical applications. Predicting potential interactions/links in biomedical networks is important for understanding the pathological mechanisms of various complex human diseases, as well as screening compound targets for drug discovery. Graph neural networks (GNNs) have been utilized for link prediction in various biomedical networks, which rely on the node features extracted from different data sources, e.g. sequence, structure and network data. However, it is challenging to effectively integrate these data sources and automatically extract features for different link prediction tasks.
In this article, we propose a novel Pre-Training Graph Neural Networks-based framework named PT-GNN to integrate different data sources for link prediction in biomedical networks. First, we design expressive deep learning methods [e.g. convolutional neural network and graph convolutional network (GCN)] to learn features for individual nodes from sequence and structure data. Second, we further propose a GCN-based encoder to effectively refine the node features by modelling the dependencies among nodes in the network. Third, the node features are pre-trained based on graph reconstruction tasks. The pre-trained features can be used for model initialization in downstream tasks. Extensive experiments have been conducted on two critical link prediction tasks, i.e. synthetic lethality (SL) prediction and drug-target interaction (DTI) prediction. Experimental results demonstrate PT-GNN outperforms the state-of-the-art methods for SL prediction and DTI prediction. In addition, the pre-trained features benefit improving the performance and reduce the training time of existing models.
Python codes and dataset are available at: https://github.com/longyahui/PT-GNN.
Supplementary data are available at Bioinformatics online.
图或网络被广泛用于为生物医学应用中的不同实体(如蛋白质、药物等)之间的相互作用建模。预测生物医学网络中的潜在相互作用/链接对于理解各种复杂人类疾病的病理机制以及筛选药物发现的化合物靶点非常重要。图神经网络(GNN)已被用于各种生物医学网络中的链接预测,其依赖于从不同数据源(如序列、结构和网络数据)提取的节点特征。然而,有效地整合这些数据源并为不同的链接预测任务自动提取特征具有挑战性。
在本文中,我们提出了一种基于预训练图神经网络的新型框架PT-GNN,用于整合不同数据源以进行生物医学网络中的链接预测。首先,我们设计了有表现力的深度学习方法[如卷积神经网络和图卷积网络(GCN)],以从序列和结构数据中学习单个节点的特征。其次,我们进一步提出了一种基于GCN的编码器,通过对网络中节点之间的依赖关系进行建模来有效地细化节点特征。第三,基于图重建任务对节点特征进行预训练。预训练的特征可用于下游任务中的模型初始化。我们在两个关键的链接预测任务上进行了广泛的实验,即合成致死性(SL)预测和药物-靶点相互作用(DTI)预测。实验结果表明,PT-GNN在SL预测和DTI预测方面优于现有方法。此外,预训练的特征有助于提高性能并减少现有模型的训练时间。
Python代码和数据集可在以下网址获取:https://github.com/longyahui/PT-GNN。
补充数据可在《生物信息学》在线获取。