Lee Ingoo, Nam Hojung
School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005, Republic of Korea.
J Cheminform. 2022 Feb 8;14(1):5. doi: 10.1186/s13321-022-00584-w.
Identifying drug-target interactions (DTIs) is important for drug discovery. However, searching all drug-target spaces poses a major bottleneck. Therefore, recently many deep learning models have been proposed to address this problem. However, the developers of these deep learning models have neglected interpretability in model construction, which is closely related to a model's performance. We hypothesized that training a model to predict important regions on a protein sequence would increase DTI prediction performance and provide a more interpretable model. Consequently, we constructed a deep learning model, named Highlights on Target Sequences (HoTS), which predicts binding regions (BRs) between a protein sequence and a drug ligand, as well as DTIs between them. To train the model, we collected complexes of protein-ligand interactions and protein sequences of binding sites and pretrained the model to predict BRs for a given protein sequence-ligand pair via object detection employing transformers. After pretraining the BR prediction, we trained the model to predict DTIs from a compound token designed to assign attention to BRs. We confirmed that training the BRs prediction model indeed improved the DTI prediction performance. The proposed HoTS model showed good performance in BR prediction on independent test datasets even though it does not use 3D structure information in its prediction. Furthermore, the HoTS model achieved the best performance in DTI prediction on test datasets. Additional analysis confirmed the appropriate attention for BRs and the importance of transformers in BR and DTI prediction. The source code is available on GitHub ( https://github.com/GIST-CSBL/HoTS ).
识别药物-靶点相互作用(DTIs)对于药物发现至关重要。然而,搜索所有的药物-靶点空间构成了一个主要瓶颈。因此,最近人们提出了许多深度学习模型来解决这个问题。然而,这些深度学习模型的开发者在模型构建过程中忽视了可解释性,而这与模型的性能密切相关。我们假设训练一个模型来预测蛋白质序列上的重要区域会提高DTI预测性能,并提供一个更具可解释性的模型。因此,我们构建了一个名为“靶点序列亮点”(HoTS)的深度学习模型,该模型可以预测蛋白质序列与药物配体之间的结合区域(BRs)以及它们之间的DTIs。为了训练该模型,我们收集了蛋白质-配体相互作用的复合物和结合位点的蛋白质序列,并通过使用Transformer的目标检测对模型进行预训练,以预测给定蛋白质序列-配体对的BRs。在对BR预测进行预训练之后,我们训练模型从一个被设计用于关注BRs的化合物标记来预测DTIs。我们证实训练BR预测模型确实提高了DTI预测性能。所提出的HoTS模型在独立测试数据集上的BR预测中表现良好,尽管它在预测中没有使用三维结构信息。此外,HoTS模型在测试数据集上的DTI预测中取得了最佳性能。进一步的分析证实了对BRs的适当关注以及Transformer在BR和DTI预测中的重要性。源代码可在GitHub上获取(https://github.com/GIST-CSBL/HoTS)。