College of Information Science and Engineering, Hunan Normal University, Changsha, P. R. China.
J Bioinform Comput Biol. 2024 Feb;22(1):2350030. doi: 10.1142/S0219720023500300.
The accurate identification of drug-target affinity (DTA) is crucial for advancements in drug discovery and development. Many deep learning-based approaches have been devised to predict drug-target binding affinity accurately, exhibiting notable improvements in performance. However, the existing prediction methods often fall short of capturing the global features of proteins. In this study, we proposed a novel model called ETransDTA, specifically designed for predicting drug-target binding affinity. ETransDTA combines convolutional layers and transformer, allowing for the simultaneous extraction of both global and local features of target proteins. Additionally, we have integrated a new graph pooling mechanism into the topology adaptive graph convolutional network (TAGCN) to enhance its capacity for learning feature representations of chemical compounds. The proposed ETransDTA model has been evaluated using the Davis and Kinase Inhibitor BioActivity (KIBA) datasets, consistently outperforming other baseline methods. The evaluation results on the KIBA dataset reveal that our model achieves the lowest mean square error (MSE) of 0.125, representing a 0.6% reduction compared to the lowest-performing baseline method. Furthermore, the incorporation of queries, keys and values produced by the stacked convolutional neural network (CNN) enables our model to better integrate the local and global context of protein representation, leading to further improvements in the accuracy of DTA prediction.
准确识别药物-靶标亲和力(DTA)对于药物发现和开发的进展至关重要。已经设计了许多基于深度学习的方法来准确预测药物-靶标结合亲和力,这些方法在性能上表现出显著的提高。然而,现有的预测方法往往无法捕捉蛋白质的全局特征。在这项研究中,我们提出了一种名为 ETransDTA 的新模型,专门用于预测药物-靶标结合亲和力。ETransDTA 结合了卷积层和转换器,能够同时提取靶蛋白的全局和局部特征。此外,我们在拓扑自适应图卷积网络(TAGCN)中集成了一种新的图池化机制,以增强其学习化合物特征表示的能力。所提出的 ETransDTA 模型已在 Davis 和激酶抑制剂生物活性(KIBA)数据集上进行了评估,始终优于其他基线方法。在 KIBA 数据集上的评估结果表明,我们的模型实现了最低的均方误差(MSE)0.125,与表现最差的基线方法相比降低了 0.6%。此外,通过堆叠卷积神经网络(CNN)生成的查询、键和值的引入,使我们的模型能够更好地整合蛋白质表示的局部和全局上下文,从而进一步提高 DTA 预测的准确性。