• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多任务协同训练的蛋白质多标签亚细胞定位和功能预测深度学习模型。

Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.

机构信息

School of Information and Software Engineering, East China Jiaotong University, No. 808 Shuanggang East Road, Nanchang 330013, China.

College of Computer Science and Electronic Engineering, Hunan University, No. 2 Lushan Road, Changsha 410082, China.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae568.

DOI:10.1093/bib/bbae568
PMID:39489606
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11531862/
Abstract

The functional study of proteins is a critical task in modern biology, playing a pivotal role in understanding the mechanisms of pathogenesis, developing new drugs, and discovering novel drug targets. However, existing computational models for subcellular localization face significant challenges, such as reliance on known Gene Ontology (GO) annotation databases or overlooking the relationship between GO annotations and subcellular localization. To address these issues, we propose DeepMTC, an end-to-end deep learning-based multi-task collaborative training model. DeepMTC integrates the interrelationship between subcellular localization and the functional annotation of proteins, leveraging multi-task collaborative training to eliminate dependence on known GO databases. This strategy gives DeepMTC a distinct advantage in predicting newly discovered proteins without prior functional annotations. First, DeepMTC leverages pre-trained language model with high accuracy to obtain the 3D structure and sequence features of proteins. Additionally, it employs a graph transformer module to encode protein sequence features, addressing the problem of long-range dependencies in graph neural networks. Finally, DeepMTC uses a functional cross-attention mechanism to efficiently combine upstream learned functional features to perform the subcellular localization task. The experimental results demonstrate that DeepMTC outperforms state-of-the-art models in both protein function prediction and subcellular localization. Moreover, interpretability experiments revealed that DeepMTC can accurately identify the key residues and functional domains of proteins, confirming its superior performance. The code and dataset of DeepMTC are freely available at https://github.com/ghli16/DeepMTC.

摘要

蛋白质的功能研究是现代生物学中的一项关键任务,对于理解发病机制、开发新药和发现新的药物靶点起着至关重要的作用。然而,现有的亚细胞定位计算模型面临着重大挑战,例如依赖于已知的基因本体论(GO)注释数据库,或者忽略 GO 注释与亚细胞定位之间的关系。为了解决这些问题,我们提出了 DeepMTC,这是一个基于端到端深度学习的多任务协作训练模型。DeepMTC 整合了亚细胞定位和蛋白质功能注释之间的相互关系,利用多任务协作训练来消除对已知 GO 数据库的依赖。这种策略使 DeepMTC 在预测新发现的蛋白质时具有明显的优势,而这些蛋白质之前没有功能注释。首先,DeepMTC 利用具有高精度的预训练语言模型来获取蛋白质的 3D 结构和序列特征。此外,它采用图转换器模块对蛋白质序列特征进行编码,解决了图神经网络中长程依赖的问题。最后,DeepMTC 使用功能交叉注意机制来有效地组合上游学习到的功能特征,以执行亚细胞定位任务。实验结果表明,DeepMTC 在蛋白质功能预测和亚细胞定位方面均优于最先进的模型。此外,可解释性实验表明,DeepMTC 可以准确识别蛋白质的关键残基和功能域,证实了其卓越的性能。DeepMTC 的代码和数据集可在 https://github.com/ghli16/DeepMTC 上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/89a411530509/bbae568f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/89aa1023f6a4/bbae568f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/d8f23ec324f6/bbae568f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/4f166cf42e5b/bbae568f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/31dfa057945b/bbae568f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/8731506ec4e6/bbae568f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/783d2e8ae11b/bbae568f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/89a411530509/bbae568f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/89aa1023f6a4/bbae568f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/d8f23ec324f6/bbae568f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/4f166cf42e5b/bbae568f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/31dfa057945b/bbae568f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/8731506ec4e6/bbae568f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/783d2e8ae11b/bbae568f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f7e/11531862/89a411530509/bbae568f7.jpg

相似文献

1
Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.基于多任务协同训练的蛋白质多标签亚细胞定位和功能预测深度学习模型。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae568.
2
GGN-GO: geometric graph networks for predicting protein function by multi-scale structure features.GGN-GO:基于多尺度结构特征预测蛋白质功能的几何图网络。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae559.
3
Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则,通过基于基因本体论注释和序列比对的多标签学习,预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。
J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.
4
LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism.LncLocFormer:一种基于 Transformer 的深度学习模型,通过使用定位特异性注意力机制,对多标签 lncRNA 亚细胞定位进行预测。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad752.
5
Protein Subcellular Localization Prediction Model Based on Graph Convolutional Network.基于图卷积网络的蛋白质亚细胞定位预测模型
Interdiscip Sci. 2022 Dec;14(4):937-946. doi: 10.1007/s12539-022-00529-9. Epub 2022 Jun 17.
6
GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction.GOProFormer:一种用于基因本体蛋白质功能预测的多模态 Transformer 方法。
Biomolecules. 2022 Nov 18;12(11):1709. doi: 10.3390/biom12111709.
7
MSlocPRED: deep transfer learning-based identification of multi-label mRNA subcellular localization.MSlocPRED:基于深度迁移学习的多标签 mRNA 亚细胞定位识别。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae504.
8
mRNA-CLA: An interpretable deep learning approach for predicting mRNA subcellular localization.mRNA-CLA:一种用于预测 mRNA 亚细胞定位的可解释深度学习方法。
Methods. 2024 Jul;227:17-26. doi: 10.1016/j.ymeth.2024.04.018. Epub 2024 May 3.
9
DRpred: A Novel Deep Learning-Based Predictor for Multi-Label mRNA Subcellular Localization Prediction by Incorporating Bayesian Inferred Prior Label Relationships.DRpred:一种新型的深度学习预测器,通过纳入贝叶斯推断的先验标签关系,用于多标签 mRNA 亚细胞定位预测。
Biomolecules. 2024 Aug 26;14(9):1067. doi: 10.3390/biom14091067.
10
Gene ontology based transfer learning for protein subcellular localization.基于基因本体论的蛋白质亚细胞定位迁移学习。
BMC Bioinformatics. 2011 Feb 2;12:44. doi: 10.1186/1471-2105-12-44.

本文引用的文献

1
De novo atomic protein structure modeling for cryoEM density maps using 3D transformer and HMM.利用 3D 转换器和 HMM 对冷冻电镜密度图进行从头原子蛋白结构建模。
Nat Commun. 2024 Jun 29;15(1):5511. doi: 10.1038/s41467-024-49647-6.
2
GPSFun: geometry-aware protein sequence function predictions with language models.GPSFun:基于语言模型的几何感知蛋白质序列功能预测。
Nucleic Acids Res. 2024 Jul 5;52(W1):W248-W255. doi: 10.1093/nar/gkae381.
3
MSF-PFP: A Novel Multisource Feature Fusion Model for Protein Function Prediction.MSF-PFP:一种用于蛋白质功能预测的新型多源特征融合模型。
J Chem Inf Model. 2024 Mar 11;64(5):1502-1511. doi: 10.1021/acs.jcim.3c01794. Epub 2024 Feb 27.
4
A comprehensive computational benchmark for evaluating deep learning-based protein function prediction approaches.一种全面的计算基准,用于评估基于深度学习的蛋白质功能预测方法。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae050.
5
Improved multi-label classifiers for predicting protein subcellular localization.改进的多标签分类器用于预测蛋白质亚细胞定位。
Math Biosci Eng. 2024 Jan;21(1):214-236. doi: 10.3934/mbe.2024010. Epub 2022 Dec 11.
6
Node-adaptive graph Transformer with structural encoding for accurate and robust lncRNA-disease association prediction.具有结构编码的节点自适应图 Transformer 用于准确稳健的 lncRNA-疾病关联预测。
BMC Genomics. 2024 Jan 18;25(1):73. doi: 10.1186/s12864-024-09998-2.
7
ML-FGAT: Identification of multi-label protein subcellular localization by interpretable graph attention networks and feature-generative adversarial networks.ML-FGAT:基于可解释图注意网络和特征生成对抗网络的多标签蛋白质亚细胞定位识别。
Comput Biol Med. 2024 Mar;170:107944. doi: 10.1016/j.compbiomed.2024.107944. Epub 2024 Jan 2.
8
Large-scale predicting protein functions through heterogeneous feature fusion.通过异质特征融合大规模预测蛋白质功能。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad243.
9
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.将蛋白质序列和结构与转换器和等变图神经网络相结合,以预测蛋白质功能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.
10
MULocDeep web service for protein localization prediction and visualization at subcellular and suborganellar levels.MULocDeep 网络服务,用于在亚细胞和亚细胞器水平上进行蛋白质定位预测和可视化。
Nucleic Acids Res. 2023 Jul 5;51(W1):W343-W349. doi: 10.1093/nar/gkad374.