预训练蛋白质语言模型综述

A Survey of Pretrained Protein Language Models.

作者信息

Pokharel Suresh, Pratyush Pawel, Chaudhari Meenal, Heinzinger Michael, Caragea Doina, Saigo Hiroto, Kc Dukka B

机构信息

Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA.

College of Applied Sciences and Technology, Illinois State University, Normal, IL, USA.

出版信息

Methods Mol Biol. 2025;2941:1-29. doi: 10.1007/978-1-0716-4623-6_1.

Abstract

Inspired by the transformative success of large language models (LLMs) in natural language processing (NLP), numerous protein language models (PLMs) have recently emerged, revolutionizing the field of protein bioinformatics. PLMs have demonstrated remarkable achievements in representing proteins and designing new ones, capturing intrinsic structural and functional information trained on vast datasets of proteins, PLMs have demonstrated exceptional performance across a variety of bioinformatics tasks, including classification, function prediction, and de novo protein design. This chapter explores the evolution of PLMs, tracing their origins from NLP-based transformers and large language models (LLMs). A comprehensive summary of notable PLMs is presented, with a particular focus on encoder-only, encoder-decoder, and decoder-only architectures. Additionally, we delve into cutting-edge trends in PLM applications, such as fine-tuning methods, multimodal architectures, and the use of reduced alphabets. These innovations underscore the growing potential of PLMs to tackle complex biological problems and drive future breakthroughs in the field.

摘要

受大语言模型(LLMs)在自然语言处理(NLP)领域取得的变革性成功启发,近期涌现出众多蛋白质语言模型(PLMs),给蛋白质生物信息学领域带来了变革。PLMs在蛋白质表征和新蛋白质设计方面取得了显著成就,通过在大量蛋白质数据集上训练来捕捉内在的结构和功能信息,PLMs在包括分类、功能预测和从头蛋白质设计在内的各种生物信息学任务中都表现出色。本章探讨了PLMs的发展历程,追溯其从基于NLP的变换器和大语言模型(LLMs)起源。文中对著名的PLMs进行了全面总结,特别关注仅编码器、编码器 - 解码器和仅解码器架构。此外,我们深入研究了PLM应用的前沿趋势,如微调方法、多模态架构以及简化字母表的使用。这些创新凸显了PLMs在解决复杂生物学问题和推动该领域未来突破方面日益增长的潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索