Suppr超能文献

简单、高效、可扩展的结构感知适配器提升蛋白质语言模型。

Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models.

机构信息

School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China.

出版信息

J Chem Inf Model. 2024 Aug 26;64(16):6338-6349. doi: 10.1021/acs.jcim.4c00689. Epub 2024 Aug 7.

Abstract

Fine-tuning pretrained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing parameter-efficient fine-tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is nontrivial due to the different training strategies and data forms. To address this gap, we introduce SES-Adapter, a simple, efficient, and scalable adapter method for enhancing the representation learning of PLMs. SES-Adapter incorporates PLM embeddings with structural sequence embeddings to create structure-aware representations. We show that the proposed method is compatible with different PLM architectures and across diverse tasks. Extensive evaluations are conducted on 2 types of folding structures with notable quality differences, 9 state-of-the-art baselines, and 9 benchmark data sets across distinct downstream tasks. Results show that compared to vanilla PLMs, SES-Adapter improves downstream task performance by a maximum of 11% and an average of 3%, with significantly accelerated convergence speed by a maximum of 1034% and an average of 362%, the training efficiency is also improved by approximately 2 times. Moreover, positive optimization is observed even with low-quality predicted structures. The source code for SES-Adapter is available at https://github.com/tyang816/SES-Adapter.

摘要

微调预先训练的蛋白质语言模型 (PLM) 已成为增强下游预测任务的一种突出策略,通常优于传统的监督学习方法。作为自然语言处理中广泛应用的强大技术,采用参数高效的微调技术有可能提高 PLM 的性能。然而,由于不同的训练策略和数据形式,直接将其应用于生命科学任务并不简单。为了解决这一差距,我们引入了 SES-Adapter,这是一种简单、高效且可扩展的适配器方法,用于增强 PLM 的表示学习。SES-Adapter 将 PLM 嵌入与结构序列嵌入相结合,以创建具有结构意识的表示。我们表明,所提出的方法与不同的 PLM 架构和各种任务兼容。我们在具有显著质量差异的 2 种折叠结构、9 个最先进的基线和 9 个不同下游任务的基准数据集上进行了广泛的评估。结果表明,与原始 PLM 相比,SES-Adapter 通过最大 11%和平均 3%的方式提高了下游任务的性能,通过最大 1034%和平均 362%的方式显著加速了收敛速度,训练效率也提高了大约 2 倍。此外,即使对于预测结构质量较低的情况,也观察到了积极的优化。SES-Adapter 的源代码可在 https://github.com/tyang816/SES-Adapter 上获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验