Suppr超能文献

PatchProt:使用蛋白质基础模型进行疏水补丁预测。

PatchProt: hydrophobic patch prediction using protein foundation models.

作者信息

Gogishvili Dea, Minois-Genin Emmanuel, van Eck Jan, Abeln Sanne

机构信息

Bioinformatics, Computer Science Department, Vrije Universiteit Amsterdam, Amsterdam, 1081 HV, The Netherlands.

AI Technology for Life, Department of Computing and Information Sciences, Department of Biology, Utrecht University, Utrecht, 3584 CS, The Netherlands.

出版信息

Bioinform Adv. 2024 Oct 14;4(1):vbae154. doi: 10.1093/bioadv/vbae154. eCollection 2024.

Abstract

MOTIVATION

Hydrophobic patches on protein surfaces play important functional roles in protein-protein and protein-ligand interactions. Large hydrophobic surfaces are also involved in the progression of aggregation diseases. Predicting exposed hydrophobic patches from a protein sequence has shown to be a difficult task. Fine-tuning foundation models allows for adapting a model to the specific nuances of a new task using a much smaller dataset. Additionally, multitask deep learning offers a promising solution for addressing data gaps, simultaneously outperforming single-task methods.

RESULTS

In this study, we harnessed a recently released leading large language model Evolutionary Scale Models (ESM-2). Efficient fine-tuning of ESM-2 was achieved by leveraging a recently developed parameter-efficient fine-tuning method. This approach enabled comprehensive training of model layers without excessive parameters and without the need to include a computationally expensive multiple sequence analysis. We explored several related tasks, at local (residue) and global (protein) levels, to improve the representation of the model. As a result, our model, PatchProt, cannot only predict hydrophobic patch areas but also outperforms existing methods at predicting primary tasks, including secondary structure and surface accessibility predictions. Importantly, our analysis shows that including related local tasks can improve predictions on more difficult global tasks. This research sets a new standard for sequence-based protein property prediction and highlights the remarkable potential of fine-tuning foundation models enriching the model representation by training over related tasks.

AVAILABILITY AND IMPLEMENTATION

https://github.com/Deagogishvili/chapter-multi-task.

摘要

动机

蛋白质表面的疏水补丁在蛋白质-蛋白质和蛋白质-配体相互作用中发挥着重要的功能作用。大的疏水表面也与聚集性疾病的进展有关。从蛋白质序列预测暴露的疏水补丁已被证明是一项艰巨的任务。微调基础模型能够使用小得多的数据集使模型适应新任务的特定细微差别。此外,多任务深度学习为解决数据缺口提供了一个有前景的解决方案,同时优于单任务方法。

结果

在本研究中,我们利用了最近发布的领先大语言模型进化尺度模型(ESM-2)。通过利用最近开发的参数高效微调方法实现了对ESM-2的高效微调。这种方法能够在不过多参数的情况下对模型层进行全面训练,并且无需进行计算成本高昂的多序列分析。我们在局部(残基)和全局(蛋白质)水平探索了几个相关任务,以改进模型的表示。结果,我们的模型PatchProt不仅能够预测疏水补丁区域,而且在预测包括二级结构和表面可及性预测在内的主要任务方面优于现有方法。重要的是,我们的分析表明,纳入相关的局部任务可以改善对更具挑战性的全局任务的预测。这项研究为基于序列的蛋白质特性预测设定了新的标准,并突出了通过对相关任务进行训练来微调基础模型以丰富模型表示的巨大潜力。

可用性和实现方式

https://github.com/Deagogishvili/chapter-multi-task

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d422/11525051/09fe578600b0/vbae154f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验