• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型蛋白质语言模型的参数高效微调可提高信号肽预测的效果。

Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction.

机构信息

Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA.

Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA

出版信息

Genome Res. 2024 Oct 11;34(9):1445-1454. doi: 10.1101/gr.279132.124.

DOI:10.1101/gr.279132.124
PMID:39060029
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11529868/
Abstract

Signal peptides (SPs) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provide a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated low-rank adaptation (LoRA) into ESM-2 models to better leverage the protein sequence evolutionary knowledge of PLMs. Experiments show that PEFT-SP using LoRA enhances state-of-the-art results, leading to a maximum Matthews correlation coefficient (MCC) gain of 87.3% for SPs with small training samples and an overall MCC gain of 6.1%. Furthermore, we also employed two other PEFT methods, prompt tuning and adapter tuning, in ESM-2 for SP prediction. More elaborate experiments show that PEFT-SP using adapter tuning can also improve the state-of-the-art results by up to 28.1% MCC gain for SPs with small training samples and an overall MCC gain of 3.8%. LoRA requires fewer computing resources and less memory than the adapter tuning during the training stage, making it possible to adapt larger and more powerful protein models for SP prediction.

摘要

信号肽(SPs)在细胞中的蛋白质转运中起着至关重要的作用。大型蛋白质语言模型(PLMs)和基于提示的学习的发展为 SP 预测提供了新的机会,特别是对于那些注释数据有限的类别。我们提出了一种参数高效微调(PEFT)框架来进行 SP 预测,即 PEFT-SP,以有效地利用预训练的 PLMs。我们将低秩自适应(LoRA)集成到 ESM-2 模型中,以更好地利用 PLMs 中蛋白质序列进化知识。实验表明,使用 LoRA 的 PEFT-SP 可以提高最先进的结果,对于训练样本较小的 SP,最大马修斯相关系数(MCC)增益为 87.3%,总体 MCC 增益为 6.1%。此外,我们还在 ESM-2 中使用另外两种 PEFT 方法,即提示调优和适配器调优,进行 SP 预测。更详细的实验表明,使用适配器调优的 PEFT-SP 也可以通过高达 28.1%的 MCC 增益来提高最先进的结果,对于训练样本较小的 SP,总体 MCC 增益为 3.8%。在训练阶段,LoRA 比适配器调优需要更少的计算资源和更少的内存,这使得为 SP 预测适配更大、更强大的蛋白质模型成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/f05b0cb70e18/1445f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/285e410ed9a1/1445f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/49cb1fcab338/1445f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/f57f3d97c96a/1445f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/f05b0cb70e18/1445f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/285e410ed9a1/1445f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/49cb1fcab338/1445f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/f57f3d97c96a/1445f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c903/11529868/f05b0cb70e18/1445f04.jpg

相似文献

1
Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction.大型蛋白质语言模型的参数高效微调可提高信号肽预测的效果。
Genome Res. 2024 Oct 11;34(9):1445-1454. doi: 10.1101/gr.279132.124.
2
Democratizing protein language models with parameter-efficient fine-tuning.参数高效微调:用民主化方法对蛋白质语言模型进行优化。
Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2405840121. doi: 10.1073/pnas.2405840121. Epub 2024 Jun 20.
3
Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning.通过参数高效微调实现蛋白质语言模型的民主化
bioRxiv. 2023 Nov 10:2023.11.09.566187. doi: 10.1101/2023.11.09.566187.
4
Parameter Efficient Fine-tuning of Transformer-based Masked Autoencoder Enhances Resource Constrained Neuroimage Analysis.基于Transformer的掩码自动编码器的参数高效微调增强了资源受限的神经图像分析。
bioRxiv. 2025 Feb 20:2025.02.15.638442. doi: 10.1101/2025.02.15.638442.
5
Assessing parameter efficient methods for pre-trained language model in annotating scRNA-seq data.评估用于注释 scRNA-seq 数据的预训练语言模型的参数高效方法。
Methods. 2024 Aug;228:12-21. doi: 10.1016/j.ymeth.2024.05.007. Epub 2024 May 15.
6
Efficiency at scale: Investigating the performance of diminutive language models in clinical tasks.规模化效率:研究微型语言模型在临床任务中的性能。
Artif Intell Med. 2024 Nov;157:103002. doi: 10.1016/j.artmed.2024.103002. Epub 2024 Oct 23.
7
Refocus the Attention for Parameter-Efficient Thermal Infrared Object Tracking.重新聚焦于参数高效的热红外目标跟踪的注意力。
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9538-9549. doi: 10.1109/TNNLS.2024.3420928. Epub 2025 May 2.
8
Embedded prompt tuning: Towards enhanced calibration of pretrained models for medical images.嵌入式提示调整:增强医学图像预训练模型校准的新途径。
Med Image Anal. 2024 Oct;97:103258. doi: 10.1016/j.media.2024.103258. Epub 2024 Jul 4.
9
Fine-tuning protein language models boosts predictions across diverse tasks.微调蛋白质语言模型可提高跨多种任务的预测能力。
Nat Commun. 2024 Aug 28;15(1):7407. doi: 10.1038/s41467-024-51844-2.
10
Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation.通过基于PConv的微调与自动提示器利用视觉基础模型进行缺陷分割。
Sensors (Basel). 2025 Apr 11;25(8):2417. doi: 10.3390/s25082417.

引用本文的文献

1
ProtLoc-GRPO: Cell line-specific subcellular localization prediction using a graph-based model and reinforcement learning.ProtLoc-GRPO:使用基于图的模型和强化学习进行细胞系特异性亚细胞定位预测。
bioRxiv. 2025 Jul 22:2025.07.17.665451. doi: 10.1101/2025.07.17.665451.
2
In silico prediction of variant effects: promises and limitations for precision plant breeding.变异效应的计算机模拟预测:精准植物育种的前景与局限
Theor Appl Genet. 2025 Jul 28;138(8):193. doi: 10.1007/s00122-025-04973-1.
3
OPUS-BFactor: Predicting Protein B-Factor with Sequence and Structure Information.

本文引用的文献

1
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
2
SignalP 6.0 predicts all five types of signal peptides using protein language models.SignalP 6.0 使用蛋白质语言模型预测所有五种类型的信号肽。
Nat Biotechnol. 2022 Jul;40(7):1023-1025. doi: 10.1038/s41587-021-01156-3. Epub 2022 Jan 3.
3
MULocDeep: A deep-learning framework for protein subcellular and suborganellar localization prediction with residue-level interpretation.
OPUS-B因子:利用序列和结构信息预测蛋白质B因子
Molecules. 2025 Jun 12;30(12):2570. doi: 10.3390/molecules30122570.
4
Artificial Intelligence-Assisted Breeding for Plant Disease Resistance.人工智能辅助的植物抗病育种
Int J Mol Sci. 2025 Jun 1;26(11):5324. doi: 10.3390/ijms26115324.
5
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
6
SELFprot: Effective and Efficient Multitask Finetuning Methods for Protein Parameter Prediction.SELFprot:用于蛋白质参数预测的高效多任务微调方法
J Chem Inf Model. 2025 Apr 14;65(7):3226-3238. doi: 10.1021/acs.jcim.4c02230. Epub 2025 Mar 17.
7
Leveraging large language models for peptide antibiotic design.利用大语言模型进行肽类抗生素设计。
Cell Rep Phys Sci. 2025 Jan 15;6(1). doi: 10.1016/j.xcrp.2024.102359. Epub 2024 Dec 31.
8
PEZy-miner: An artificial intelligence driven approach for the discovery of plastic-degrading enzyme candidates.PEZy-矿工:一种用于发现塑料降解酶候选物的人工智能驱动方法。
Metab Eng Commun. 2024 Sep 5;19:e00248. doi: 10.1016/j.mec.2024.e00248. eCollection 2024 Dec.
MULocDeep:一种用于蛋白质亚细胞和亚细胞器定位预测并具有残基水平解释的深度学习框架。
Comput Struct Biotechnol J. 2021 Aug 18;19:4825-4839. doi: 10.1016/j.csbj.2021.08.027. eCollection 2021.
4
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
5
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
6
Targeting of proteins to the twin-arginine translocation pathway.靶向双精氨酸转运途径的蛋白质。
Mol Microbiol. 2020 May;113(5):861-871. doi: 10.1111/mmi.14461. Epub 2020 Feb 20.
7
SignalP 5.0 improves signal peptide predictions using deep neural networks.SignalP 5.0 使用深度神经网络改进了信号肽预测。
Nat Biotechnol. 2019 Apr;37(4):420-423. doi: 10.1038/s41587-019-0036-z. Epub 2019 Feb 18.
8
A comprehensive review of signal peptides: Structure, roles, and applications.信号肽的综合综述:结构、作用与应用。
Eur J Cell Biol. 2018 Aug;97(6):422-441. doi: 10.1016/j.ejcb.2018.06.003. Epub 2018 Jun 22.
9
Signal peptides for recombinant protein secretion in bacterial expression systems.用于细菌表达系统中重组蛋白分泌的信号肽。
Microb Cell Fact. 2018 Mar 29;17(1):52. doi: 10.1186/s12934-018-0901-3.
10
DeepSig: deep learning improves signal peptide detection in proteins.DeepSig:深度学习提高蛋白质中信号肽的检测。
Bioinformatics. 2018 May 15;34(10):1690-1696. doi: 10.1093/bioinformatics/btx818.