Suppr超能文献

基于预训练模型输出作为嵌入并基于结构感知交叉注意力进行特征融合的可解释药物-靶点亲和力预测。

Interpretable drug-target affinity prediction based on pre-trained models' output as embeddings and based on structure-aware cross-attention for feature fusion.

作者信息

Zheng Fang, Zhao Juanjuan, Yuan Zihang, Gao Yuanchen, Li Yafeng, Li Yaheng, Geng Yan, Qiang Yan

机构信息

College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, 209 University Street, Yuci District, Jinzhong, 030600, China.

School of Software, Taiyuan University of Technology, 209 University Street, Yuci District, Jinzhong, 030600, China.

出版信息

Mol Divers. 2025 Apr 25. doi: 10.1007/s11030-025-11194-7.

Abstract

The characteristics of protein pockets can better capture the interaction information between proteins and small molecules, thereby improving the performance of drug-target interaction (DTI) prediction tasks. However, pocket data typically need to be predicted using software such as AlphaFold, which would entail a massive workload for datasets ranging from tens of thousands to hundreds of thousands of samples. Moreover, feature representation networks for 3D pocket data are computationally intensive. To address this, we propose simulating 3D pocket data using sequence data through feature fusion of two different objects based on structure cross-attention (CASD). Additionally, precise feature representation is a prerequisite for accurately identifying pocket information. We introduce a method that leverages the output of the last layer of a pre-trained model as an embedding layer for training a new model from scratch. This approach not only incorporates prior knowledge from the pre-trained model but also expands model capacity, enabling more accurate feature representation. Furthermore, we enhance the multimodal representation of small molecule compounds using feature fusion based on structure cross-attention for the same object (CASS), further improving feature representation capabilities. Our cross-attention mechanisms operate at the token-level or node-level, allowing fine-grained capture of interactions between amino acids and atoms. This enables the identification of the contribution score of each atom or amino acid to the task, making our model interpretable for drug-target prediction. Experimental validation demonstrates that our model achieves state-of-the-art predictive performance.

摘要

蛋白质口袋的特征能够更好地捕捉蛋白质与小分子之间的相互作用信息,从而提高药物-靶点相互作用(DTI)预测任务的性能。然而,口袋数据通常需要使用诸如AlphaFold等软件进行预测,对于从数万到数十万样本的数据集而言,这将带来巨大的工作量。此外,用于三维口袋数据的特征表示网络计算量很大。为了解决这个问题,我们提出基于结构交叉注意力(CASD)通过对两个不同对象进行特征融合,利用序列数据模拟三维口袋数据。此外,精确的特征表示是准确识别口袋信息的先决条件。我们引入一种方法,利用预训练模型最后一层的输出作为从头开始训练新模型的嵌入层。这种方法不仅融合了预训练模型的先验知识,还扩展了模型容量,从而实现更准确的特征表示。此外,我们基于相同对象的结构交叉注意力(CASS)通过特征融合增强小分子化合物的多模态表示,进一步提高特征表示能力。我们的交叉注意力机制在token级别或节点级别运行,能够精细地捕捉氨基酸和原子之间的相互作用。这使得能够识别每个原子或氨基酸对任务的贡献分数,从而使我们的模型在药物-靶点预测方面具有可解释性。实验验证表明,我们的模型实现了当前最优的预测性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验