• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

预训练蛋白质语言模型综述

A Survey of Pretrained Protein Language Models.

作者信息

Pokharel Suresh, Pratyush Pawel, Chaudhari Meenal, Heinzinger Michael, Caragea Doina, Saigo Hiroto, Kc Dukka B

机构信息

Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA.

College of Applied Sciences and Technology, Illinois State University, Normal, IL, USA.

出版信息

Methods Mol Biol. 2025;2941:1-29. doi: 10.1007/978-1-0716-4623-6_1.

DOI:10.1007/978-1-0716-4623-6_1
PMID:40601248
Abstract

Inspired by the transformative success of large language models (LLMs) in natural language processing (NLP), numerous protein language models (PLMs) have recently emerged, revolutionizing the field of protein bioinformatics. PLMs have demonstrated remarkable achievements in representing proteins and designing new ones, capturing intrinsic structural and functional information trained on vast datasets of proteins, PLMs have demonstrated exceptional performance across a variety of bioinformatics tasks, including classification, function prediction, and de novo protein design. This chapter explores the evolution of PLMs, tracing their origins from NLP-based transformers and large language models (LLMs). A comprehensive summary of notable PLMs is presented, with a particular focus on encoder-only, encoder-decoder, and decoder-only architectures. Additionally, we delve into cutting-edge trends in PLM applications, such as fine-tuning methods, multimodal architectures, and the use of reduced alphabets. These innovations underscore the growing potential of PLMs to tackle complex biological problems and drive future breakthroughs in the field.

摘要

受大语言模型(LLMs)在自然语言处理(NLP)领域取得的变革性成功启发,近期涌现出众多蛋白质语言模型(PLMs),给蛋白质生物信息学领域带来了变革。PLMs在蛋白质表征和新蛋白质设计方面取得了显著成就,通过在大量蛋白质数据集上训练来捕捉内在的结构和功能信息,PLMs在包括分类、功能预测和从头蛋白质设计在内的各种生物信息学任务中都表现出色。本章探讨了PLMs的发展历程,追溯其从基于NLP的变换器和大语言模型(LLMs)起源。文中对著名的PLMs进行了全面总结,特别关注仅编码器、编码器 - 解码器和仅解码器架构。此外,我们深入研究了PLM应用的前沿趋势,如微调方法、多模态架构以及简化字母表的使用。这些创新凸显了PLMs在解决复杂生物学问题和推动该领域未来突破方面日益增长的潜力。

相似文献

1
A Survey of Pretrained Protein Language Models.预训练蛋白质语言模型综述
Methods Mol Biol. 2025;2941:1-29. doi: 10.1007/978-1-0716-4623-6_1.
2
Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks.通过高效微调增强结构感知蛋白质语言模型以用于各种蛋白质预测任务
Methods Mol Biol. 2025;2941:31-58. doi: 10.1007/978-1-0716-4623-6_2.
3
Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins.基于大语言模型(LLM)在蛋白质翻译后修饰位点预测方面的进展。
Methods Mol Biol. 2025;2941:313-355. doi: 10.1007/978-1-0716-4623-6_19.
4
Leveraging large language models for spelling correction in Turkish.利用大语言模型进行土耳其语拼写纠正。
PeerJ Comput Sci. 2025 Jun 16;11:e2889. doi: 10.7717/peerj-cs.2889. eCollection 2025.
5
The first step is the hardest: pitfalls of representing and tokenizing temporal data for large language models.第一步是最困难的:为大型语言模型表示和标记时间数据的陷阱。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2151-2158. doi: 10.1093/jamia/ocae090.
6
Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation.使用大语言模型嵌入对精神障碍相关自发交流进行算法分类:算法开发与验证
JMIR AI. 2025 May 30;4:e67369. doi: 10.2196/67369.
7
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
8
Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.利用大语言模型对合成及真实世界社交媒体上有关结膜炎爆发的帖子中的流行病学特征进行分类:信息流行病学研究
J Med Internet Res. 2025 Jul 2;27:e65226. doi: 10.2196/65226.
9
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
10
Sentiment Analysis Using a Large Language Model-Based Approach to Detect Opioids Mixed With Other Substances Via Social Media: Method Development and Validation.使用基于大语言模型的方法通过社交媒体检测与其他物质混合的阿片类药物的情感分析:方法开发与验证
JMIR Infodemiology. 2025 Jun 19;5:e70525. doi: 10.2196/70525.

本文引用的文献

1
GraphPBSP: Protein binding site prediction based on Graph Attention Network and pre-trained model ProstT5.GraphPBSP:基于图注意力网络和预训练模型ProstT5的蛋白质结合位点预测
Int J Biol Macromol. 2024 Dec;282(Pt 1):136933. doi: 10.1016/j.ijbiomac.2024.136933. Epub 2024 Oct 28.
2
Large language models and their applications in bioinformatics.大语言模型及其在生物信息学中的应用。
Comput Struct Biotechnol J. 2024 Oct 5;23:3498-3505. doi: 10.1016/j.csbj.2024.09.031. eCollection 2024 Dec.
3
PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models.
PDNAPred:基于预先训练的蛋白质语言模型的蛋白质-DNA 结合位点的可解释预测。
Int J Biol Macromol. 2024 Nov;281(Pt 2):136147. doi: 10.1016/j.ijbiomac.2024.136147. Epub 2024 Oct 1.
4
AutoPeptideML: a study on how to build more trustworthy peptide bioactivity predictors.AutoPeptideML:关于如何构建更可信的肽生物活性预测器的研究。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae555.
5
Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein.通过在 DNA 结合蛋白上进行领域自适应预训练来提高通用蛋白质语言模型的预测性能。
Nat Commun. 2024 Sep 7;15(1):7838. doi: 10.1038/s41467-024-52293-7.
6
Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model.基于提示的 GPT-2 模型微调进行翻译后修饰预测。
Nat Commun. 2024 Aug 7;15(1):6699. doi: 10.1038/s41467-024-51071-9.
7
ProtAgents: protein discovery large language model multi-agent collaborations combining physics and machine learning.ProtAgents:蛋白质发现大型语言模型,结合物理和机器学习的多智能体协作。
Digit Discov. 2024 May 17;3(7):1389-1409. doi: 10.1039/d4dd00013g. eCollection 2024 Jul 10.
8
PreDBP-PLMs: Prediction of DNA-binding proteins based on pre-trained protein language models and convolutional neural networks.PreDBP-PLMs:基于预训练蛋白质语言模型和卷积神经网络的DNA结合蛋白预测
Anal Biochem. 2024 Nov;694:115603. doi: 10.1016/j.ab.2024.115603. Epub 2024 Jul 8.
9
Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning.通过少样本学习利用最少的湿实验数据提高蛋白质语言模型的效率。
Nat Commun. 2024 Jul 2;15(1):5566. doi: 10.1038/s41467-024-49798-6.
10
Democratizing protein language models with parameter-efficient fine-tuning.参数高效微调:用民主化方法对蛋白质语言模型进行优化。
Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2405840121. doi: 10.1073/pnas.2405840121. Epub 2024 Jun 20.