Zhang Yufang, Li Jiayi, Lin Shenggeng, Zhao Jianwei, Xiong Yi, Wei Dong-Qing
School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, 200240, China.
Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
J Cheminform. 2024 Jun 7;16(1):67. doi: 10.1186/s13321-024-00862-9.
Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes.Scientific contributionsThe methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.
识别化合物与蛋白质之间的相互作用对于各种应用至关重要,包括药物发现、靶点识别、网络药理学以及蛋白质功能阐释。基于深度神经网络的方法在高效识别具有高通量能力的化合物 - 蛋白质相互作用方面越来越受欢迎,缩小了传统劳动密集型、耗时且昂贵的实验技术的候选范围。在本研究中,我们提出了一种端到端的方法,称为SPVec - SGCN - CPI,该方法利用简化图卷积网络(SGCN)模型,结合我们先前开发的模型SPVec生成的低维连续特征和图拓扑信息来预测化合物 - 蛋白质相互作用。SGCN技术将局部邻域聚合和非线性逐层传播步骤分开,有效地聚合了K阶邻居信息,同时避免了邻居爆炸并加快了训练速度。在三个数据集上评估了SPVec - SGCN - CPI方法的性能,并与四种基于机器学习和深度学习的方法以及六种最先进的方法进行了比较。实验结果表明,SPVec - SGCN - CPI优于所有这些竞争方法,尤其在不平衡数据场景中表现出色。通过将节点特征和拓扑信息传播到特征空间,SPVec - SGCN - CPI有效地纳入了化合物与蛋白质之间的相互作用,实现了异质性的融合。此外,我们的方法对ChEMBL中的所有未标记数据进行了评分,并通过分子对接和现有证据确认了排名前五的化合物 - 蛋白质相互作用。这些发现表明,我们的模型可以可靠地揭示未标记化合物 - 蛋白质对中的化合物 - 蛋白质相互作用,对药物重新定位和发现具有重要意义。总之,SPVec - SGCN在准确预测化合物 - 蛋白质相互作用方面展示了其有效性,显示出增强靶点识别和简化药物发现过程的潜力。
科学贡献
本工作中提出的方法不仅能够相对准确地预测化合物 - 蛋白质相互作用,而且首次同时考虑了现实世界中非常常见的样本不平衡问题和计算效率,加速了靶点识别和药物发现过程。