Huang Dingfang, Wang Yu, Sun Yiming, Ji Wenhao, Zhang Qing, Jiang Yunya, Qiu Haodi, Liu Haichun, Lu Tao, Wei Xian, Chen Yadong, Zhang Yanmin
Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
State Key Laboratory of Natural Medicines, China Pharmaceutical University, No. 639 Longmian Dadao, Nanjing, 211198, China.
Mol Divers. 2024 Dec 23. doi: 10.1007/s11030-024-11044-y.
Protein-ligand interactions are the molecular basis of many important cellular activities, such as gene regulation, cell metabolism, and signal transduction. Protein-ligand binding affinity is a crucial metric of the strength of the interaction between the two, and accurate prediction of its binding affinity is essential for discovering drugs' new uses. So far, although many predictive models based on machine learning and deep learning have been reported, most of the models mainly focus on one-dimensional sequence and two-dimensional structural characteristics of proteins and ligands, but fail to deeply explore the detailed interaction information between proteins and ligand atoms in the binding pocket region of three-dimensional space. In this study, we introduced a novel 4D tensor feature to capture key interactions within the binding pocket and developed a three-dimensional convolutional neural network (CNN) model based on this feature. Using ten-fold cross-validation, we identified the optimal parameter combination and pocket size. Additionally, we employed feature engineering to extract features across multiple dimensions, including one-dimensional sequences, two-dimensional structures of the ligand and protein, and three-dimensional interaction features between them. We proposed an efficient protein-ligand binding affinity prediction model MCDTA (multi-dimensional convolutional drug-target affinity), built on a multi-dimensional convolutional neural network framework. Feature ablation experiments revealed that the 4D tensor feature had the most significant impact on model performance. MCDTA performed exceptionally well on the PDBbind v.2020 dataset, achieving an RMSE of 1.231 and a PCC of 0.823. In comparative experiments, it outperformed five other mainstream binding affinity prediction models, with an RMSE of 1.349 and a PCC of 0.795. Moreover, MCDTA demonstrated strong generalization ability and practical screening performance across multiple benchmark datasets, highlighting its reliability and accuracy in predicting protein-ligand binding affinity. The code for MCDTA is available at https://github.com/dfhuang-AI/MCDTA .
蛋白质-配体相互作用是许多重要细胞活动的分子基础,如基因调控、细胞代谢和信号转导。蛋白质-配体结合亲和力是两者之间相互作用强度的关键指标,准确预测其结合亲和力对于发现药物的新用途至关重要。到目前为止,尽管已经报道了许多基于机器学习和深度学习的预测模型,但大多数模型主要关注蛋白质和配体的一维序列和二维结构特征,而未能深入探索三维空间结合口袋区域中蛋白质和配体原子之间的详细相互作用信息。在本研究中,我们引入了一种新颖的4D张量特征来捕获结合口袋内的关键相互作用,并基于此特征开发了一种三维卷积神经网络(CNN)模型。使用十折交叉验证,我们确定了最佳参数组合和口袋大小。此外,我们采用特征工程来提取跨多个维度的特征,包括一维序列、配体和蛋白质的二维结构以及它们之间的三维相互作用特征。我们提出了一种基于多维卷积神经网络框架的高效蛋白质-配体结合亲和力预测模型MCDTA(多维卷积药物-靶点亲和力)。特征消融实验表明,4D张量特征对模型性能的影响最为显著。MCDTA在PDBbind v.2020数据集上表现出色,RMSE为1.231,PCC为0.823。在比较实验中,它优于其他五个主流结合亲和力预测模型,RMSE为1.349,PCC为0.795。此外,MCDTA在多个基准数据集上展示了强大的泛化能力和实际筛选性能,突出了其在预测蛋白质-配体结合亲和力方面的可靠性和准确性。MCDTA的代码可在https://github.com/dfhuang-AI/MCDTA获取。