• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

S-PLM:通过序列与结构之间的对比学习实现的结构感知蛋白质语言模型。

S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure.

作者信息

Wang Duolin, Pourmirzaei Mahdi, Abbas Usman L, Zeng Shuai, Manshour Negin, Esmaili Farzaneh, Poudel Biplab, Jiang Yuexu, Shao Qing, Chen Jin, Xu Dong

出版信息

bioRxiv. 2024 May 13:2023.08.06.552203. doi: 10.1101/2023.08.06.552203.

DOI:10.1101/2023.08.06.552203
PMID:37609352
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10441326/
Abstract

Proteins play an essential role in various biological and engineering processes. Large protein language models (PLMs) present excellent potential to reshape protein research by accelerating the determination of protein function and the design of proteins with the desired functions. The prediction and design capacity of PLMs relies on the representation gained from the protein sequences. However, the lack of crucial 3D structure information in most PLMs restricts the prediction capacity of PLMs in various applications, especially those heavily dependent on 3D structures. To address this issue, we introduce S-PLM, a 3D structure-aware PLM that utilizes multi-view contrastive learning to align the sequence and 3D structure of a protein in a coordinated latent space. S-PLM applies Swin-Transformer on AlphaFold-predicted protein structures to embed the structural information and fuses it into sequence-based embedding from ESM2. Additionally, we provide a library of lightweight tuning tools to adapt S-PLM for diverse protein property prediction tasks. Our results demonstrate S-PLM's superior performance over sequence-only PLMs on all protein clustering and classification tasks, achieving competitiveness comparable to state-of-the-art methods requiring both sequence and structure inputs. S-PLM and its lightweight tuning tools are available at https://github.com/duolinwang/S-PLM/ .

摘要

蛋白质在各种生物和工程过程中发挥着至关重要的作用。大型蛋白质语言模型(PLMs)通过加速蛋白质功能的确定以及设计具有所需功能的蛋白质,展现出重塑蛋白质研究的巨大潜力。PLMs的预测和设计能力依赖于从蛋白质序列中获得的表示。然而,大多数PLMs中缺乏关键的三维结构信息,限制了PLMs在各种应用中的预测能力,尤其是那些严重依赖三维结构的应用。为了解决这个问题,我们引入了S-PLM,这是一种三维结构感知的PLM,它利用多视图对比学习在一个协调的潜在空间中对齐蛋白质的序列和三维结构。S-PLM将Swin-Transformer应用于AlphaFold预测的蛋白质结构,以嵌入结构信息,并将其融合到来自ESM2的基于序列的嵌入中。此外,我们提供了一个轻量级调优工具库,以使S-PLM适用于各种蛋白质特性预测任务。我们的结果表明,在所有蛋白质聚类和分类任务中S-PLM比仅基于序列的PLMs具有更优越的性能,在需要序列和结构输入的情况下实现了与最先进方法相当的竞争力。S-PLM及其轻量级调优工具可在https://github.com/duolinwang/S-PLM/获取。

相似文献

1
S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure.S-PLM:通过序列与结构之间的对比学习实现的结构感知蛋白质语言模型。
bioRxiv. 2024 May 13:2023.08.06.552203. doi: 10.1101/2023.08.06.552203.
2
S-PLM: Structure-Aware Protein Language Model via Contrastive Learning Between Sequence and Structure.S-PLM:通过序列与结构之间的对比学习实现的结构感知蛋白质语言模型
Adv Sci (Weinh). 2025 Feb;12(5):e2404212. doi: 10.1002/advs.202404212. Epub 2024 Dec 12.
3
PLM-DBPs: enhancing plant DNA-binding protein prediction by integrating sequence-based and structure-aware protein language models.PLM-DBPs:通过整合基于序列和结构感知的蛋白质语言模型增强植物DNA结合蛋白预测
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf245.
4
Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models.简单、高效、可扩展的结构感知适配器提升蛋白质语言模型。
J Chem Inf Model. 2024 Aug 26;64(16):6338-6349. doi: 10.1021/acs.jcim.4c00689. Epub 2024 Aug 7.
5
ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models.ProCeSa:用于蛋白质语言模型热稳定性预测的对比增强结构感知网络。
J Chem Inf Model. 2025 Mar 10;65(5):2304-2313. doi: 10.1021/acs.jcim.4c01752. Epub 2025 Feb 23.
6
Structure-Informed Protein Language Models are Robust Predictors for Variant Effects.结构信息蛋白质语言模型是变异效应的稳健预测器。
Res Sq. 2023 Aug 3:rs.3.rs-3219092. doi: 10.21203/rs.3.rs-3219092/v1.
7
Aggregating residue-level protein language model embeddings with optimal transport.通过最优传输聚合残基水平的蛋白质语言模型嵌入
Bioinform Adv. 2025 Mar 20;5(1):vbaf060. doi: 10.1093/bioadv/vbaf060. eCollection 2025.
8
Does protein pretrained language model facilitate the prediction of protein-ligand interaction?蛋白质预训练语言模型是否有助于预测蛋白质-配体相互作用?
Methods. 2023 Nov;219:8-15. doi: 10.1016/j.ymeth.2023.08.016. Epub 2023 Sep 9.
9
THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model.THPLM:一种基于序列的深度学习框架,用于使用预先训练的蛋白质语言模型预测点变异后蛋白质稳定性的变化。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad646.
10
FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking.FusOn-pLM:一种通过聚焦概率掩码的融合癌蛋白特异性语言模型。
bioRxiv. 2024 Jun 4:2024.06.03.597245. doi: 10.1101/2024.06.03.597245.