• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过高效微调增强结构感知蛋白质语言模型以用于各种蛋白质预测任务

Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks.

作者信息

Zhang Yichuan, Qin Yongfang, Pourmirzaei Mahdi, Shao Qing, Wang Duolin, Xu Dong

机构信息

Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.

Chemical & Materials Engineering, University of Kentucky, Lexington, KY, USA.

出版信息

Methods Mol Biol. 2025;2941:31-58. doi: 10.1007/978-1-0716-4623-6_2.

DOI:10.1007/978-1-0716-4623-6_2
PMID:40601249
Abstract

Proteins are crucial in a wide range of biological and engineering processes. Large protein language models (PLMs) can significantly advance our understanding and engineering of proteins. However, the effectiveness of PLMs in prediction and design is largely based on the representations derived from protein sequences. Without incorporating the three-dimensional (3D) structures of proteins, PLMs would overlook crucial aspects of how proteins interact with other molecules, thereby limiting their predictive accuracy. To address this issue, we present S-PLM, a 3D structure-aware PLM, that employs multi-view contrastive learning to align protein sequences with their 3D structures in a unified latent space. Previously, we utilized a contact map-based approach to encode structural information, applying the Swin-Transformer to contact maps derived from AlphaFold-predicted protein structures. This work introduces a new approach that leverages a geometric vector perceptron (GVP) model to process 3D coordinates and obtain structural embeddings. We focus on the application of structure-aware models for protein-related tasks by utilizing efficient fine-tuning methods to achieve optimal performance without significant computational costs. Our results show that S-PLM outperforms sequence-only PLMs across all protein clustering and classification tasks, achieving performance on par with state-of-the-art methods that require both sequence and structure inputs. S-PLM and its tuning tools are available at https://github.com/duolinwang/S-PLM/ .

摘要

蛋白质在广泛的生物和工程过程中至关重要。大型蛋白质语言模型(PLM)可以显著推进我们对蛋白质的理解和工程应用。然而,PLM在预测和设计方面的有效性很大程度上基于从蛋白质序列中得出的表示。如果不纳入蛋白质的三维(3D)结构,PLM将忽略蛋白质与其他分子相互作用的关键方面,从而限制其预测准确性。为了解决这个问题,我们提出了S-PLM,一种3D结构感知的PLM,它采用多视图对比学习在统一的潜在空间中将蛋白质序列与其3D结构对齐。此前,我们利用基于接触图的方法来编码结构信息,将Swin-Transformer应用于从AlphaFold预测的蛋白质结构中得出的接触图。这项工作引入了一种新方法,利用几何向量感知器(GVP)模型来处理3D坐标并获得结构嵌入。我们通过利用高效的微调方法在不产生显著计算成本的情况下实现最佳性能,专注于结构感知模型在蛋白质相关任务中的应用。我们的结果表明,在所有蛋白质聚类和分类任务中,S-PLM的性能优于仅基于序列的PLM,达到了与需要序列和结构输入的最先进方法相当的性能。S-PLM及其微调工具可在https://github.com/duolinwang/S-PLM/获取。

相似文献

1
Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks.通过高效微调增强结构感知蛋白质语言模型以用于各种蛋白质预测任务
Methods Mol Biol. 2025;2941:31-58. doi: 10.1007/978-1-0716-4623-6_2.
2
Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins.基于大语言模型(LLM)在蛋白质翻译后修饰位点预测方面的进展。
Methods Mol Biol. 2025;2941:313-355. doi: 10.1007/978-1-0716-4623-6_19.
3
S-PLM: Structure-Aware Protein Language Model via Contrastive Learning Between Sequence and Structure.S-PLM:通过序列与结构之间的对比学习实现的结构感知蛋白质语言模型
Adv Sci (Weinh). 2025 Feb;12(5):e2404212. doi: 10.1002/advs.202404212. Epub 2024 Dec 12.
4
Boost Protein Language Model with Injected Structure Information Through Parameter Efficient Fine-tuning.通过参数高效微调注入结构信息来增强蛋白质语言模型。
Comput Biol Med. 2025 Sep;195:110607. doi: 10.1016/j.compbiomed.2025.110607. Epub 2025 Jun 30.
5
S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure.S-PLM:通过序列与结构之间的对比学习实现的结构感知蛋白质语言模型。
bioRxiv. 2024 May 13:2023.08.06.552203. doi: 10.1101/2023.08.06.552203.
6
MTPrompt-PTM: A Multi-Task Method for Post-Translational Modification Prediction Using Prompt Tuning on a Structure-Aware Protein Language Model.MTPrompt-PTM:一种基于结构感知蛋白质语言模型的提示调整用于翻译后修饰预测的多任务方法。
Biomolecules. 2025 Jun 9;15(6):843. doi: 10.3390/biom15060843.
7
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
8
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
9
Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略
Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.
10
MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model.MoRF_ESM:基于深度变压器蛋白质语言模型预测无序蛋白质中的分子识别特征片段
J Bioinform Comput Biol. 2024 Apr;22(2):2450006. doi: 10.1142/S0219720024500069. Epub 2024 May 28.

本文引用的文献

1
Discovery of deaminase functions by structure-based protein clustering.基于结构的蛋白质聚类发现脱氨酶功能。
Cell. 2023 Jul 20;186(15):3182-3195.e14. doi: 10.1016/j.cell.2023.05.041. Epub 2023 Jun 27.
2
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
3
Enzyme function prediction using contrastive learning.使用对比学习进行酶功能预测。
Science. 2023 Mar 31;379(6639):1358-1363. doi: 10.1126/science.adf2465. Epub 2023 Mar 30.
4
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
5
ProtGPT2 is a deep unsupervised language model for protein design.ProtGPT2 是一个用于蛋白质设计的深度无监督语言模型。
Nat Commun. 2022 Jul 27;13(1):4348. doi: 10.1038/s41467-022-32007-7.
6
ProteinBERT: a universal deep-learning model of protein sequence and function.蛋白质 BERT:一种通用的蛋白质序列和功能深度学习模型。
Bioinformatics. 2022 Apr 12;38(8):2102-2110. doi: 10.1093/bioinformatics/btac020.
7
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
8
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
9
Learning the protein language: Evolution, structure, and function.学习蛋白质语言:进化、结构和功能。
Cell Syst. 2021 Jun 16;12(6):654-669.e3. doi: 10.1016/j.cels.2021.05.017.
10
Structure-based protein function prediction using graph convolutional networks.基于结构的蛋白质功能预测使用图卷积网络。
Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.