Tang Chunyan, Zhong Cheng, Wang Mian, Zhou Fengfeng
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1030-1040. doi: 10.1109/TCBB.2022.3172340. Epub 2023 Apr 3.
Identifying interactions between compounds and proteins is an essential task in drug discovery. To recommend compounds as new drug candidates, applying the computational approaches has a lower cost than conducting the wet-lab experiments. Machine learning-based methods, especially deep learning-based methods, have advantages in learning complex feature interactions between compounds and proteins. However, deep learning models will over-generalize and lead to the problem of predicting less relevant compound-protein pairs when the compound-protein feature interactions are high-dimensional sparse. This problem can be overcome by learning both low-order and high-order feature interactions. In this paper, we propose a novel hybrid model with Factorization Machines and Graph Neural Network called FMGNN to extract the low-order and high-order features, respectively. Then, we design a compound-protein interactions (CPIs) prediction method with pharmacophore features of compound and physicochemical properties of amino acids. The pharmacophore features can ensure that the prediction results much more fit the expectation of biological experiment and the physicochemical properties of amino acids are loaded into the embedding layer to improve the convergence speed and accuracy of protein feature learning. The experimental results on several datasets, especially on an imbalanced large-scale dataset, showed that our proposed method outperforms other existing methods for CPI prediction. The western blot experiment results on wogonin and its candidate target proteins also showed that our proposed method is effective and accurate for finding target proteins. The computer program of implementing the model FMGNN is available at https://github.com/tcygxu2021/FMGNN.
识别化合物与蛋白质之间的相互作用是药物研发中的一项重要任务。为了推荐化合物作为新的药物候选物,应用计算方法的成本低于进行湿实验室实验。基于机器学习的方法,尤其是基于深度学习的方法,在学习化合物与蛋白质之间复杂的特征相互作用方面具有优势。然而,当化合物 - 蛋白质特征相互作用是高维稀疏时,深度学习模型会过度泛化并导致预测不太相关的化合物 - 蛋白质对的问题。通过学习低阶和高阶特征相互作用可以克服这个问题。在本文中,我们提出了一种结合因子分解机和图神经网络的新型混合模型,称为FMGNN,分别提取低阶和高阶特征。然后,我们设计了一种基于化合物药效团特征和氨基酸物理化学性质的化合物 - 蛋白质相互作用(CPI)预测方法。药效团特征可以确保预测结果更符合生物学实验的预期,并且氨基酸的物理化学性质被加载到嵌入层中以提高蛋白质特征学习的收敛速度和准确性。在几个数据集上的实验结果,特别是在一个不平衡的大规模数据集上,表明我们提出的方法在CPI预测方面优于其他现有方法。关于汉黄芩素及其候选靶蛋白的蛋白质免疫印迹实验结果也表明,我们提出的方法在寻找靶蛋白方面是有效且准确的。实现模型FMGNN的计算机程序可在https://github.com/tcygxu2021/FMGNN获取。