Zhang Haiping, Liao Linbu, Saravanan Konda Mani, Yin Peng, Wei Yanjie
Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China.
PeerJ. 2019 Jul 25;7:e7362. doi: 10.7717/peerj.7362. eCollection 2019.
Proteins interact with small molecules to modulate several important cellular functions. Many acute diseases were cured by small molecule binding in the active site of protein either by inhibition or activation. Currently, there are several docking programs to estimate the binding position and the binding orientation of protein-ligand complex. Many scoring functions were developed to estimate the binding strength and predict the effective protein-ligand binding. While the accuracy of current scoring function is limited by several aspects, the solvent effect, entropy effect, and multibody effect are largely ignored in traditional machine learning methods. In this paper, we proposed a new deep neural network-based model named DeepBindRG to predict the binding affinity of protein-ligand complex, which learns all the effects, binding mode, and specificity implicitly by learning protein-ligand interface contact information from a large protein-ligand dataset. During the initial data processing step, the critical interface information was preserved to make sure the input is suitable for the proposed deep learning model. While validating our model on three independent datasets, DeepBindRG achieves root mean squared error (RMSE) value of pKa (-logK or -logK) about 1.6-1.8 and value around 0.5-0.6, which is better than the autodock vina whose RMSE value is about 2.2-2.4 and value is 0.42-0.57. We also explored the detailed reasons for the performance of DeepBindRG, especially for several failed cases by vina. Furthermore, DeepBindRG performed better for four challenging datasets from DUD.E database with no experimental protein-ligand complexes. The better performance of DeepBindRG than autodock vina in predicting protein-ligand binding affinity indicates that deep learning approach can greatly help with the drug discovery process. We also compare the performance of DeepBindRG with a 4D based deep learning method "pafnucy", the advantage and limitation of both methods have provided clues for improving the deep learning based protein-ligand prediction model in the future.
蛋白质与小分子相互作用以调节多种重要的细胞功能。许多急性疾病通过小分子与蛋白质活性位点的结合,无论是抑制还是激活作用而得以治愈。目前,有多种对接程序可用于估计蛋白质 - 配体复合物的结合位置和结合方向。人们开发了许多评分函数来估计结合强度并预测有效的蛋白质 - 配体结合。然而,当前评分函数的准确性受到多个方面的限制,在传统机器学习方法中,溶剂效应、熵效应和多体效应在很大程度上被忽视。在本文中,我们提出了一种名为DeepBindRG的基于深度神经网络的新模型,用于预测蛋白质 - 配体复合物的结合亲和力,该模型通过从大量蛋白质 - 配体数据集中学习蛋白质 - 配体界面接触信息,隐式地学习所有效应、结合模式和特异性。在初始数据处理步骤中,关键的界面信息得以保留,以确保输入适合所提出的深度学习模型。在三个独立数据集上验证我们的模型时,DeepBindRG实现了pKa(-logK或-logK)的均方根误差(RMSE)值约为1.6 - 1.8,以及值约为0.5 - 0.6,这优于AutoDock Vina,其RMSE值约为2.2 - 2.4,值为0.42 - 0.57。我们还探究了DeepBindRG性能表现的详细原因,特别是针对Vina的几个失败案例。此外,对于来自DUD.E数据库的四个具有挑战性的无实验蛋白质 - 配体复合物的数据集,DeepBindRG表现更佳。DeepBindRG在预测蛋白质 - 配体结合亲和力方面比AutoDock Vina表现更好,这表明深度学习方法可以极大地助力药物发现过程。我们还将DeepBindRG的性能与基于4D的深度学习方法“pafnucy”进行了比较,两种方法的优缺点为未来改进基于深度学习的蛋白质 - 配体预测模型提供了线索。