Rezaei Mohammad A, Li Yanjun, Wu Dapeng, Li Xiaolin, Li Chenglong
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):407-417. doi: 10.1109/TCBB.2020.3046945. Epub 2022 Feb 3.
Computational drug design relies on the calculation of binding strength between two biological counterparts especially a chemical compound, i.e., a ligand, and a protein. Predicting the affinity of protein-ligand binding with reasonable accuracy is crucial for drug discovery, and enables the optimization of compounds to achieve better interaction with their target protein. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. We carried out validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set. We demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other baseline scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearson's R=0.83 and RMSE=1.23 pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.
计算药物设计依赖于计算两种生物对应物之间的结合强度,特别是一种化合物(即配体)与一种蛋白质之间的结合强度。以合理的准确度预测蛋白质-配体结合的亲和力对于药物发现至关重要,并且能够优化化合物以实现与它们的靶蛋白更好的相互作用。在本文中,我们提出了一个名为DeepAtom的数据驱动框架,以准确预测蛋白质-配体结合亲和力。借助三维卷积神经网络(3D-CNN)架构,DeepAtom能够从体素化的复合物结构中自动提取与结合相关的原子相互作用模式。与其他基于CNN的方法相比,我们的轻量级模型设计有效地提高了模型的表征能力,即使在可用训练数据有限的情况下也是如此。我们在PDBbind v.2016基准测试和独立的阿斯利康多样集上进行了验证实验。我们证明,较少依赖特征工程的DeepAtom方法始终优于其他基线评分方法。我们还汇编并提出了一个新的基准数据集,以进一步提高模型性能。以新数据集作为训练输入,DeepAtom在PDBbind v.2016核心集上实现了皮尔逊相关系数R = 0.83和均方根误差RMSE = 1.23 pK单位。这些有前景的结果表明,DeepAtom模型可以潜在地应用于计算药物开发协议,如分子对接和虚拟筛选。