Department of Computer Science, City University of Hong Kong, Hong Kong SAR.
Department of Pathology, Harvard Medical School, Boston, USA.
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac446.
The identification of drug-target interactions (DTIs) plays a vital role for in silico drug discovery, in which the drug is the chemical molecule, and the target is the protein residues in the binding pocket. Manual DTI annotation approaches remain reliable; however, it is notoriously laborious and time-consuming to test each drug-target pair exhaustively. Recently, the rapid growth of labelled DTI data has catalysed interests in high-throughput DTI prediction. Unfortunately, those methods highly rely on the manual features denoted by human, leading to errors.
Here, we developed an end-to-end deep learning framework called CoaDTI to significantly improve the efficiency and interpretability of drug target annotation. CoaDTI incorporates the Co-attention mechanism to model the interaction information from the drug modality and protein modality. In particular, CoaDTI incorporates transformer to learn the protein representations from raw amino acid sequences, and GraphSage to extract the molecule graph features from SMILES. Furthermore, we proposed to employ the transfer learning strategy to encode protein features by pre-trained transformer to address the issue of scarce labelled data. The experimental results demonstrate that CoaDTI achieves competitive performance on three public datasets compared with state-of-the-art models. In addition, the transfer learning strategy further boosts the performance to an unprecedented level. The extended study reveals that CoaDTI can identify novel DTIs such as reactions between candidate drugs and severe acute respiratory syndrome coronavirus 2-associated proteins. The visualization of co-attention scores can illustrate the interpretability of our model for mechanistic insights.
Source code are publicly available at https://github.com/Layne-Huang/CoaDTI.
药物-靶标相互作用(DTI)的鉴定对于计算机药物发现至关重要,其中药物是化学分子,靶标是结合口袋中的蛋白质残基。手动 DTI 注释方法仍然可靠;然而,彻底测试每一对药物-靶标对都是非常费力和耗时的。最近,标记 DTI 数据的快速增长激发了人们对高通量 DTI 预测的兴趣。不幸的是,这些方法高度依赖于人类标记的手动特征,导致错误。
在这里,我们开发了一个端到端的深度学习框架,称为 CoaDTI,以显著提高药物靶标注释的效率和可解释性。CoaDTI 结合了 Co-attention 机制,从药物模态和蛋白质模态中建模相互作用信息。特别是,CoaDTI 结合了转换器从原始氨基酸序列中学习蛋白质表示,以及 GraphSage 从 SMILES 中提取分子图特征。此外,我们提出采用迁移学习策略通过预先训练的转换器对蛋白质特征进行编码,以解决标记数据稀缺的问题。实验结果表明,与最先进的模型相比,CoaDTI 在三个公共数据集上实现了有竞争力的性能。此外,迁移学习策略进一步将性能提升到了前所未有的水平。扩展研究表明,CoaDTI 可以识别候选药物与严重急性呼吸综合征冠状病毒 2 相关蛋白之间的新的 DTI。共注意力得分的可视化可以说明我们的模型对于机制见解的可解释性。
源代码可在 https://github.com/Layne-Huang/CoaDTI 上公开获取。