• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过分组多任务学习和预训练蛋白质语言模型识别蛋白质-核苷酸结合残基

Identifying Protein-Nucleotide Binding Residues via Grouped Multi-task Learning and Pre-trained Protein Language Models.

作者信息

Wu Jiashun, Liu Yan, Zhang Ying, Wang Xiaoyu, Yan He, Zhu Yiheng, Song Jiangning, Yu Dong-Jun

机构信息

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.

School of Information Engineering, Yangzhou University, Yangzhou 225100, China.

出版信息

J Chem Inf Model. 2025 Jan 27;65(2):1040-1052. doi: 10.1021/acs.jcim.4c02092. Epub 2025 Jan 9.

DOI:10.1021/acs.jcim.4c02092
PMID:39788787
Abstract

The accurate identification of protein-nucleotide binding residues is crucial for protein function annotation and drug discovery. Numerous computational methods have been proposed to predict these binding residues, achieving remarkable performance. However, due to the limited availability and high variability of nucleotides, predicting binding residues for diverse nucleotides remains a significant challenge. To address these, we propose NucGMTL, a new grouped deep multi-task learning approach designed for predicting binding residues of all observed nucleotides in the BioLiP database. NucGMTL leverages pre-trained protein language models to generate robust sequence embedding and incorporates multi-scale learning along with scale-based self-attention mechanisms to capture a broader range of feature dependencies. To effectively harness the shared binding patterns across various nucleotides, deep multi-task learning is utilized to distill common representations, taking advantage of auxiliary information from similar nucleotides selected based on task grouping. Performance evaluation on benchmark data sets shows that NucGMTL achieves an average area under the Precision-Recall curve (AUPRC) of 0.594, surpassing other state-of-the-art methods. Further analyses highlight that the predominant advantage of NucGMTL can be reflected by its effective integration of grouped multi-task learning and pre-trained protein language models. The data set and source code are freely accessible at: https://github.com/jerry1984Y/NucGMTL.

摘要

准确识别蛋白质 - 核苷酸结合残基对于蛋白质功能注释和药物发现至关重要。已经提出了许多计算方法来预测这些结合残基,并取得了显著的性能。然而,由于核苷酸的可用性有限且变异性高,预测不同核苷酸的结合残基仍然是一项重大挑战。为了解决这些问题,我们提出了NucGMTL,这是一种新的分组深度多任务学习方法,旨在预测BioLiP数据库中所有观察到的核苷酸的结合残基。NucGMTL利用预训练的蛋白质语言模型生成强大的序列嵌入,并结合多尺度学习以及基于尺度的自注意力机制来捕获更广泛的特征依赖关系。为了有效利用各种核苷酸之间共享的结合模式,利用深度多任务学习来提取共同表示,利用基于任务分组选择的相似核苷酸的辅助信息。在基准数据集上的性能评估表明,NucGMTL的精确召回曲线下面积(AUPRC)平均达到0.594,超过了其他现有方法。进一步分析表明,NucGMTL的主要优势可以通过其对分组多任务学习和预训练蛋白质语言模型的有效整合来体现。数据集和源代码可在以下网址免费获取:https://github.com/jerry1984Y/NucGMTL 。

相似文献

1
Identifying Protein-Nucleotide Binding Residues via Grouped Multi-task Learning and Pre-trained Protein Language Models.通过分组多任务学习和预训练蛋白质语言模型识别蛋白质-核苷酸结合残基
J Chem Inf Model. 2025 Jan 27;65(2):1040-1052. doi: 10.1021/acs.jcim.4c02092. Epub 2025 Jan 9.
2
Identification of Protein-Nucleotide Binding Residues With Deep Multi-Task and Multi-Scale Learning.
IEEE J Biomed Health Inform. 2025 Jul;29(7):5329-5338. doi: 10.1109/JBHI.2025.3547386.
3
Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning.通过预训练语言模型和多任务学习从蛋白质序列中进行无对齐金属离子结合位点预测。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac444.
4
Deciphering the Language of Protein-DNA Interactions: A Deep Learning Approach Combining Contextual Embeddings and Multi-Scale Sequence Modeling.解析蛋白质- DNA 相互作用的语言:结合上下文嵌入和多尺度序列建模的深度学习方法。
J Mol Biol. 2024 Nov 15;436(22):168769. doi: 10.1016/j.jmb.2024.168769. Epub 2024 Aug 29.
5
Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.基于多任务协同训练的蛋白质多标签亚细胞定位和功能预测深度学习模型。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae568.
6
Predicting protein-peptide binding residues via interpretable deep learning.通过可解释的深度学习预测蛋白质-肽结合残基
Bioinformatics. 2022 Jun 27;38(13):3351-3360. doi: 10.1093/bioinformatics/btac352.
7
Prediction of protein-ATP binding residues using multi-view feature learning via contextual-based co-attention network.基于上下文协同注意网络的多视图特征学习预测蛋白-ATP 结合残基。
Comput Biol Med. 2024 Apr;172:108227. doi: 10.1016/j.compbiomed.2024.108227. Epub 2024 Mar 4.
8
Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism.基于蛋白质语言模型和交叉注意力机制的蛋白质-肽结合残基预测。
Anal Biochem. 2024 Nov;694:115637. doi: 10.1016/j.ab.2024.115637. Epub 2024 Aug 8.
9
DP-site: A dual deep learning-based method for protein-peptide interaction site prediction.DP-site:一种基于双重深度学习的蛋白质-肽相互作用位点预测方法。
Methods. 2024 Sep;229:17-29. doi: 10.1016/j.ymeth.2024.06.001. Epub 2024 Jun 12.
10
PMSFF: Improved Protein Binding Residues Prediction through Multi-Scale Sequence-Based Feature Fusion Strategy.PMSFF:通过多尺度序列的基于特征融合策略来提高蛋白质结合残基预测。
Biomolecules. 2024 Sep 27;14(10):1220. doi: 10.3390/biom14101220.

引用本文的文献

1
Predicting nucleic acid binding sites by attention map-guided graph convolutional network with protein language embeddings and physicochemical information.利用注意力图引导的图卷积网络结合蛋白质语言嵌入和物理化学信息预测核酸结合位点。
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf457.