Suppr超能文献

DeepAllo:使用具有多任务学习的蛋白质语言模型(pLM)进行变构位点预测。

DeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning.

作者信息

Khokhar Moaaz, Keskin Ozlem, Gursoy Attila

机构信息

Department of Computer Engineering, Koç University, 34450 Istanbul, Turkey.

KUIS AI Center, Koç University, 34450 Istanbul, Turkey.

出版信息

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf294.

Abstract

MOTIVATION

Allostery, the process by which binding at one site perturbs a distant site, is being rendered as a key focus in the field of drug development with its substantial impact on protein function. The identification of allosteric pockets (sites) is a challenging task and several techniques have been developed, including Machine Learning to predict allosteric pockets that utilize both static and pocket features.

RESULTS

Our work, DeepAllo, is the first study that combines fine-tuned protein language model (pLM) with FPocket features and shows an increase in prediction performance of allosteric sites over previous studies. The pLM model was fine-tuned on AlloSteric Database (ASD) in Multitask Learning setting and was further used as a feature extractor to train XGBoost and AutoML models. The best model predicts allosteric pockets with 89.66% F1 score and 90.5% of allosteric pockets in the top 3 positions, outperforming previous results. A case study has been performed on proteins with known allosteric pockets, which shows the proof of our approach. Moreover, an effort was made to explain the pLM by visualizing its attention mechanism among allosteric and non-allosteric residues.

AVAILABILITY AND IMPLEMENTATION

The source code is available on GitHub (https://github.com/MoaazK/deepallo) and archived on Zenodo (DOI: 10.5281/zenodo.15255379). The trained model is hosted on Hugging Face (DOI: 10.57967/hf/5198). The dataset used for training and evaluation is archived on Zenodo (DOI: 10.5281/zenodo.15255437).

摘要

动机

变构是指一个位点的结合会干扰远处位点的过程,由于其对蛋白质功能有重大影响,已成为药物开发领域的一个关键焦点。变构口袋(位点)的识别是一项具有挑战性的任务,已经开发了几种技术,包括利用静态和口袋特征来预测变构口袋的机器学习技术。

结果

我们的工作DeepAllo是第一项将微调后的蛋白质语言模型(pLM)与FPocket特征相结合的研究,并且显示出变构位点的预测性能比以前的研究有所提高。pLM模型在多任务学习设置下于变构数据库(ASD)上进行了微调,并进一步用作特征提取器来训练XGBoost和自动机器学习模型。最佳模型预测变构口袋的F1分数为89.66%,并且90.5%的变构口袋在前三位,优于先前的结果。对具有已知变构口袋的蛋白质进行了案例研究,证明了我们方法的有效性。此外,还通过可视化其在变构和非变构残基之间的注意力机制来努力解释pLM。

可用性和实现方式

源代码可在GitHub(https://github.com/MoaazK/deepallo)上获取,并已存档于Zenodo(DOI:10.5281/zenodo.15255379)。训练好的模型托管在Hugging Face(DOI:10.57967/hf/5198)上。用于训练和评估的数据集已存档于Zenodo(DOI:10.5281/zenodo.15255437)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec75/12145174/9d034b64cc44/btaf294f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验