• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过参数高效微调注入结构信息来增强蛋白质语言模型。

Boost Protein Language Model with Injected Structure Information Through Parameter Efficient Fine-tuning.

作者信息

Zhang Zixun, Zhou Yuzhe, Zheng Jiayou, Feng Chunmei, Cui Shuguang, Wang Sheng, Li Zhen

机构信息

FNii-Shenzhen, 2001 Longxiang Boulevard, Longgang District, Shenzhen, 518172, Guangdong, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Boulevard, Longgang District, Shenzhen, 518172, Guangdong, China.

Institute of High Performance Computing, A*STAR, 1 Fusionopolis Way #16-16 Connexis, Singapore, 138632, Singapore.

出版信息

Comput Biol Med. 2025 Sep;195:110607. doi: 10.1016/j.compbiomed.2025.110607. Epub 2025 Jun 30.

DOI:10.1016/j.compbiomed.2025.110607
PMID:40592174
Abstract

Large-scale Protein Language Models (PLMs), such as the Evolutionary Scale Modeling (ESM) family, have significantly advanced our understanding of protein structures and functions. These models have shown immense potential in biomedical applications, including drug discovery, protein design, and understanding disease mechanisms at the molecular level. However, PLMs are typically pre-trained on residue sequences alone, with limited incorporation of structural information, presenting opportunities for further enhancement. In this paper, we propose Structure Information Injecting Tuning (SI-Tuning), a parameter-efficient fine-tuning method, to integrate structural information into PLMs. SI-Tuning maintains the original model parameters in a frozen state while optimizing task-specific vectors for input embedding and attention maps. Structural features, including dihedral angles and distance maps, are used to derive this vector, injecting the structural information that improves model performance in downstream tasks. Extensive experiments on 650M ESM-2 demonstrate the effectiveness of our SI-Tuning across multiple downstream tasks. Specifically, our SI-Tuning achieves an accuracy of 93.95% on DeepLoc binary classification, and 76.05% on Metal Ion Binding, outperforming SaProt, a large-scale pre-trained PLM with structural modeling. SI-Tuning effectively enhances the performance of PLMs by incorporating structural information in a parameter-efficient manner. Our method not only advances downstream task performance, but also offers significant computational efficiency, making it a valuable strategy for applying large-scale PLM to broad biomedical downstream applications. Code is available at https://github.com/Nocturne0256/SI-tuning.

摘要

诸如进化尺度建模(ESM)家族这样的大规模蛋白质语言模型(PLM),极大地推进了我们对蛋白质结构和功能的理解。这些模型在生物医学应用中展现出了巨大潜力,包括药物发现、蛋白质设计以及在分子水平上理解疾病机制。然而,PLM通常仅在残基序列上进行预训练,对结构信息的整合有限,这为进一步改进提供了机会。在本文中,我们提出了结构信息注入微调(SI-Tuning),这是一种参数高效的微调方法,用于将结构信息整合到PLM中。SI-Tuning在冻结状态下保持原始模型参数,同时针对输入嵌入和注意力图优化特定任务向量。包括二面角和距离图在内的结构特征被用于推导此向量,注入能提升模型在下游任务中性能的结构信息。在6.5亿参数的ESM-2上进行的大量实验证明了我们的SI-Tuning在多个下游任务中的有效性。具体而言,我们的SI-Tuning在DeepLoc二元分类任务中达到了93.95%的准确率,在金属离子结合任务中达到了76.05%的准确率,优于具有结构建模功能的大规模预训练PLM——SaProt。SI-Tuning通过以参数高效的方式整合结构信息,有效地提升了PLM的性能。我们的方法不仅提高了下游任务的性能,还具有显著的计算效率,使其成为将大规模PLM应用于广泛生物医学下游应用的宝贵策略。代码可在https://github.com/Nocturne0256/SI-tuning获取。

相似文献

1
Boost Protein Language Model with Injected Structure Information Through Parameter Efficient Fine-tuning.通过参数高效微调注入结构信息来增强蛋白质语言模型。
Comput Biol Med. 2025 Sep;195:110607. doi: 10.1016/j.compbiomed.2025.110607. Epub 2025 Jun 30.
2
Enhancing Structure-Aware Protein Language Models with Efficient Fine-Tuning for Various Protein Prediction Tasks.通过高效微调增强结构感知蛋白质语言模型以用于各种蛋白质预测任务
Methods Mol Biol. 2025;2941:31-58. doi: 10.1007/978-1-0716-4623-6_2.
3
Fine-tuning medical language models for enhanced long-contextual understanding and domain expertise.微调医学语言模型以增强长上下文理解和领域专业知识。
Quant Imaging Med Surg. 2025 Jun 6;15(6):5450-5462. doi: 10.21037/qims-2024-2655. Epub 2025 Jun 3.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Shaping pre-trained language models for task-specific embedding generation via consistency calibration.通过一致性校准塑造预训练语言模型以生成特定任务嵌入。
Neural Netw. 2025 Nov;191:107754. doi: 10.1016/j.neunet.2025.107754. Epub 2025 Jun 21.
6
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
7
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
9
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
10
Factors that impact on the use of mechanical ventilation weaning protocols in critically ill adults and children: a qualitative evidence-synthesis.影响重症成人和儿童机械通气撤机方案使用的因素:一项定性证据综合分析
Cochrane Database Syst Rev. 2016 Oct 4;10(10):CD011812. doi: 10.1002/14651858.CD011812.pub2.