Department of Physics, University of Science and Technology of China, Hefei, China.
Department of Physics, City University of Hong Kong, Hong Kong, China.
J Comput Chem. 2024 Dec 15;45(32):2929-2940. doi: 10.1002/jcc.27499. Epub 2024 Sep 2.
Predicting protein-ligand binding affinity is a crucial and challenging task in structure-based drug discovery. With the accumulation of complex structures and binding affinity data, various machine-learning scoring functions, particularly those based on deep learning, have been developed for this task, exhibiting superiority over their traditional counterparts. A fusion model sequentially connecting a graph neural network (GNN) and a convolutional neural network (CNN) to predict protein-ligand binding affinity is proposed in this work. In this model, the intermediate outputs of the GNN layers, as supplementary descriptors of atomic chemical environments at different levels, are concatenated with the input features of CNN. The model demonstrates a noticeable improvement in performance on CASF-2016 benchmark compared to its constituent CNN models. The generalization ability of the model is evaluated by setting a series of thresholds for ligand extended-connectivity fingerprint similarity or protein sequence similarity between the training and test sets. Masking experiment reveals that model can capture key interaction regions. Furthermore, the fusion model is applied to a virtual screening task for a novel target, PI5P4Kα. The fusion strategy significantly improves the ability of the constituent CNN model to identify active compounds. This work offers a novel approach to enhancing the accuracy of deep learning models in predicting binding affinity through fusion strategies.
预测蛋白质-配体结合亲和力是基于结构的药物发现中的一项关键且具有挑战性的任务。随着复杂结构和结合亲和力数据的积累,已经开发了各种基于机器学习的评分函数,特别是基于深度学习的评分函数,这些函数在该任务上表现优于传统的评分函数。本研究提出了一种融合模型,该模型通过顺序连接图神经网络(GNN)和卷积神经网络(CNN)来预测蛋白质-配体结合亲和力。在该模型中,GNN 层的中间输出作为不同水平原子化学环境的补充描述符,与 CNN 的输入特征串联。与组成它的 CNN 模型相比,该模型在 CASF-2016 基准测试中的性能有了显著提高。通过在训练集和测试集之间设置一系列配体扩展连接指纹相似性或蛋白质序列相似性的阈值,评估了模型的泛化能力。掩蔽实验表明模型可以捕捉到关键的相互作用区域。此外,该融合模型还应用于一种新型靶标 PI5P4Kα 的虚拟筛选任务。融合策略显著提高了组成 CNN 模型识别活性化合物的能力。这项工作提供了一种通过融合策略提高深度学习模型预测结合亲和力准确性的新方法。