Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China.
School of Mechanical and Electrical Engineering, Dalian Minzu University, Dalian 116600, China.
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad560.
Accurate prediction of drug-target binding affinity (DTA) is crucial for drug discovery. The increase in the publication of large-scale DTA datasets enables the development of various computational methods for DTA prediction. Numerous deep learning-based methods have been proposed to predict affinities, some of which only utilize original sequence information or complex structures, but the effective combination of various information and protein-binding pockets have not been fully mined. Therefore, a new method that integrates available key information is urgently needed to predict DTA and accelerate the drug discovery process.
In this study, we propose a novel deep learning-based predictor termed DataDTA to estimate the affinities of drug-target pairs. DataDTA utilizes descriptors of predicted pockets and sequences of proteins, as well as low-dimensional molecular features and SMILES strings of compounds as inputs. Specifically, the pockets were predicted from the three-dimensional structure of proteins and their descriptors were extracted as the partial input features for DTA prediction. The molecular representation of compounds based on algebraic graph features was collected to supplement the input information of targets. Furthermore, to ensure effective learning of multiscale interaction features, a dual-interaction aggregation neural network strategy was developed. DataDTA was compared with state-of-the-art methods on different datasets, and the results showed that DataDTA is a reliable prediction tool for affinities estimation. Specifically, the concordance index (CI) of DataDTA is 0.806 and the Pearson correlation coefficient (R) value is 0.814 on the test dataset, which is higher than other methods.
The codes and datasets of DataDTA are available at https://github.com/YanZhu06/DataDTA.
准确预测药物-靶标结合亲和力(DTA)对于药物发现至关重要。随着大规模 DTA 数据集的发表增加,各种用于 DTA 预测的计算方法得以发展。已经提出了许多基于深度学习的方法来预测亲和力,其中一些方法仅利用原始序列信息或复杂结构,但尚未充分挖掘各种信息和蛋白质结合口袋的有效组合。因此,迫切需要一种新的方法来整合可用的关键信息,以预测 DTA 并加速药物发现过程。
在本研究中,我们提出了一种称为 DataDTA 的新的基于深度学习的预测器,用于估计药物-靶标对的亲和力。DataDTA 利用蛋白质的预测口袋和序列的描述符以及化合物的低维分子特征和 SMILES 字符串作为输入。具体来说,口袋是从蛋白质的三维结构预测的,并且从其描述符中提取出部分输入特征用于 DTA 预测。基于代数图特征的化合物的分子表示被收集来补充目标的输入信息。此外,为了确保有效学习多尺度相互作用特征,开发了一种双相互作用聚合神经网络策略。在不同的数据集上,将 DataDTA 与最先进的方法进行了比较,结果表明 DataDTA 是一种可靠的亲和力估计预测工具。具体来说,DataDTA 在测试数据集上的一致性指数(CI)为 0.806,Pearson 相关系数(R)值为 0.814,高于其他方法。
DataDTA 的代码和数据集可在 https://github.com/YanZhu06/DataDTA 上获得。