Lin Xuan, Quan Zhe, Wang Zhi-Jie, Guo Yan, Zeng Xiangxiang, Yu Philip S
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):932-943. doi: 10.1109/TCBB.2022.3198003. Epub 2023 Apr 3.
Effectively identifying compound-protein interactions (CPIs) is crucial for new drug design, which is an important step in silico drug discovery. Current machine learning methods for CPI prediction mainly use one-demensional (1D) compound/protein strings and/or the specific descriptors. However, they often ignore the fact that molecules are essentially modeled by the molecular graph. We observe that in real-world scenarios, the topological structure information of the molecular graph usually provides an overview of how the atoms are connected, and the local chemical context reveals the functionality of the protein sequence in CPI. These two types of information are complementary to each other and they are both significant for modeling compound-protein pairs. Motivated by this, we propose an end-to-end deep learning framework named GraphCPI, which captures the structural information of compounds and leverages the chemical context of protein sequences for solving the CPI prediction task. Our framework can integrate any popular graph neural networks for learning compounds, and it combines with a convolutional neural network for embedding sequences. To compare our method with classic and state-of-the-art deep learning methods, we conduct extensive experiments based on several widely-used CPI datasets. The experimental results show the feasibility and competitiveness of our proposed method.
有效识别化合物 - 蛋白质相互作用(CPI)对于新药设计至关重要,这是计算机辅助药物发现中的重要一步。当前用于CPI预测的机器学习方法主要使用一维(1D)化合物/蛋白质字符串和/或特定描述符。然而,它们常常忽略了分子本质上是由分子图建模的这一事实。我们观察到,在实际场景中,分子图的拓扑结构信息通常提供了原子连接方式的概述,而局部化学环境揭示了CPI中蛋白质序列的功能。这两种类型的信息相互补充,对于建模化合物 - 蛋白质对都很重要。受此启发,我们提出了一个名为GraphCPI的端到端深度学习框架,该框架捕捉化合物的结构信息,并利用蛋白质序列的化学环境来解决CPI预测任务。我们的框架可以集成任何流行的用于学习化合物的图神经网络,并与卷积神经网络相结合以嵌入序列。为了将我们的方法与经典和最新的深度学习方法进行比较,我们基于几个广泛使用的CPI数据集进行了大量实验。实验结果表明了我们提出的方法的可行性和竞争力。