• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PLMFit:使用蛋白质语言模型进行蛋白质工程的迁移学习基准测试

PLMFit: benchmarking transfer learning with protein language models for protein engineering.

作者信息

Bikias Thomas, Stamkopoulos Evangelos, Reddy Sai T

机构信息

Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.

Botnar Institute of Immune Engineering, Basel, Switzerland.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf381.

DOI:10.1093/bib/bbaf381
PMID:40736745
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12309243/
Abstract

Protein language models (PLMs) have emerged as a useful resource for protein engineering applications. Transfer learning (TL) leverages pre-trained parameters to extract features to train machine learning models or adjust the weights of PLMs for novel tasks via fine-tuning (FT) through back-propagation. TL methods have shown potential for enhancing protein predictions performance when paired with PLMs, however there is a notable lack of comparative analyses that benchmark TL methods applied to state-of-the-art PLMs, identify optimal strategies for transferring knowledge and determine the most suitable approach for specific tasks. Here, we report PLMFit, a benchmarking study that combines, three state-of-the-art PLMs (ESM2, ProGen2, ProteinBert), with three TL methods (feature extraction, low-rank adaptation, bottleneck adapters) for five protein engineering datasets. We conducted over >3150 in silico experiments, altering PLM sizes and layers, TL hyperparameters and different training procedures. Our experiments reveal three key findings: (i) utilizing a partial fraction of PLM for TL does not detrimentally impact performance, (ii) the choice between feature extraction (FE) and fine-tuning is primarily dictated by the amount and diversity of data, and (iii) FT is most effective when generalization is necessary and only limited data is available. We provide PLMFit as an open-source software package, serving as a valuable resource for the scientific community to facilitate the FE and FT of PLMs for various applications.

摘要

蛋白质语言模型(PLMs)已成为蛋白质工程应用的有用资源。迁移学习(TL)利用预训练参数来提取特征,以训练机器学习模型,或通过反向传播微调(FT)来调整PLMs的权重以用于新任务。当与PLMs结合使用时,TL方法已显示出提高蛋白质预测性能的潜力,然而,明显缺乏对应用于最先进PLMs的TL方法进行比较分析,确定知识转移的最佳策略,并确定特定任务的最合适方法。在这里,我们报告了PLMFit,这是一项基准研究,它将三种最先进的PLMs(ESM2、ProGen2、ProteinBert)与三种TL方法(特征提取、低秩适应、瓶颈适配器)结合用于五个蛋白质工程数据集。我们进行了超过3150次计算机模拟实验,改变了PLM的大小和层数、TL超参数以及不同的训练程序。我们的实验揭示了三个关键发现:(i)将PLM的一部分用于TL不会对性能产生不利影响;(ii)特征提取(FE)和微调之间的选择主要取决于数据的数量和多样性;(iii)当需要泛化且只有有限的数据可用时,FT最有效。我们将PLMFit作为一个开源软件包提供,作为科学界的宝贵资源,以促进PLMs在各种应用中的FE和FT。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/ac2cb1057cf0/bbaf381f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/67d96c626e69/bbaf381f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/b99fc592adf7/bbaf381f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/b73b5a4770a0/bbaf381f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/3e4866cd354a/bbaf381f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/ac2cb1057cf0/bbaf381f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/67d96c626e69/bbaf381f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/b99fc592adf7/bbaf381f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/b73b5a4770a0/bbaf381f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/3e4866cd354a/bbaf381f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f75b/12309243/ac2cb1057cf0/bbaf381f5.jpg

相似文献

1
PLMFit: benchmarking transfer learning with protein language models for protein engineering.PLMFit:使用蛋白质语言模型进行蛋白质工程的迁移学习基准测试
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf381.
2
A Benchmarking Platform for Assessing Protein Language Models on Function-Related Prediction Tasks.一个用于在功能相关预测任务上评估蛋白质语言模型的基准测试平台。
Methods Mol Biol. 2025;2947:241-268. doi: 10.1007/978-1-0716-4662-5_14.
3
Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks.通过高效微调增强结构感知蛋白质语言模型以用于各种蛋白质预测任务
Methods Mol Biol. 2025;2941:31-58. doi: 10.1007/978-1-0716-4623-6_2.
4
Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性:来自电子健康记录的心血管诊断案例研究
JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.
5
Benchmarking protein language models for protein crystallization.用于蛋白质结晶的蛋白质语言模型基准测试。
Sci Rep. 2025 Jan 18;15(1):2381. doi: 10.1038/s41598-025-86519-5.
6
Advancing the accuracy of clathrin protein prediction through multi-source protein language models.通过多源蛋白质语言模型提高网格蛋白蛋白质预测的准确性。
Sci Rep. 2025 Jul 8;15(1):24403. doi: 10.1038/s41598-025-08510-4.
7
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
8
Sexual Harassment and Prevention Training性骚扰与预防培训
9
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
10
Short-Term Memory Impairment短期记忆障碍

引用本文的文献

1
Scaling down for efficiency: Medium-sized protein language models perform well at transfer learning on realistic datasets.为提高效率而缩小规模:中型蛋白质语言模型在真实数据集的迁移学习中表现良好。
bioRxiv. 2025 Jan 28:2024.11.22.624936. doi: 10.1101/2024.11.22.624936.

本文引用的文献

1
Simulating 500 million years of evolution with a language model.用语言模型模拟5亿年的进化历程。
Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.
2
Fine-tuning protein language models boosts predictions across diverse tasks.微调蛋白质语言模型可提高跨多种任务的预测能力。
Nat Commun. 2024 Aug 28;15(1):7407. doi: 10.1038/s41467-024-51844-2.
3
Structure-informed protein language models are robust predictors for variant effects.基于结构的蛋白质语言模型是变异效应的强大预测工具。
Hum Genet. 2025 Mar;144(2-3):209-225. doi: 10.1007/s00439-024-02695-w. Epub 2024 Aug 8.
4
Democratizing protein language models with parameter-efficient fine-tuning.参数高效微调:用民主化方法对蛋白质语言模型进行优化。
Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2405840121. doi: 10.1073/pnas.2405840121. Epub 2024 Jun 20.
5
Using protein language models for protein interaction hot spot prediction with limited data.利用蛋白质语言模型对有限数据进行蛋白质相互作用热点预测。
BMC Bioinformatics. 2024 Mar 16;25(1):115. doi: 10.1186/s12859-024-05737-2.
6
Generative models for protein structures and sequences.蛋白质结构与序列的生成模型。
Nat Biotechnol. 2024 Feb;42(2):196-199. doi: 10.1038/s41587-023-02115-w.
7
Designing proteins with language models.利用语言模型设计蛋白质。
Nat Biotechnol. 2024 Feb;42(2):200-202. doi: 10.1038/s41587-024-02123-4.
8
Protein language models can capture protein quaternary state.蛋白质语言模型可以捕捉蛋白质四级结构。
BMC Bioinformatics. 2023 Nov 14;24(1):433. doi: 10.1186/s12859-023-05549-w.
9
Enhancing antibody affinity through experimental sampling of non-deleterious CDR mutations predicted by machine learning.通过对机器学习预测的非有害互补决定区(CDR)突变进行实验采样来提高抗体亲和力。
Commun Chem. 2023 Nov 9;6(1):244. doi: 10.1038/s42004-023-01037-7.
10
Efficient evolution of human antibodies from general protein language models.从通用蛋白质语言模型中高效进化出人类抗体。
Nat Biotechnol. 2024 Feb;42(2):275-283. doi: 10.1038/s41587-023-01763-2. Epub 2023 Apr 24.