DeepAllo：使用具有多任务学习的蛋白质语言模型（pLM）进行变构位点预测。

DeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning.

作者信息

Khokhar Moaaz, Keskin Ozlem, Gursoy Attila

机构信息

Department of Computer Engineering, Koç University, 34450 Istanbul, Turkey.

KUIS AI Center, Koç University, 34450 Istanbul, Turkey.

出版信息

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf294.

DOI:10.1093/bioinformatics/btaf294

PMID:40372465

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12145174/

Abstract

MOTIVATION

Allostery, the process by which binding at one site perturbs a distant site, is being rendered as a key focus in the field of drug development with its substantial impact on protein function. The identification of allosteric pockets (sites) is a challenging task and several techniques have been developed, including Machine Learning to predict allosteric pockets that utilize both static and pocket features.

RESULTS

Our work, DeepAllo, is the first study that combines fine-tuned protein language model (pLM) with FPocket features and shows an increase in prediction performance of allosteric sites over previous studies. The pLM model was fine-tuned on AlloSteric Database (ASD) in Multitask Learning setting and was further used as a feature extractor to train XGBoost and AutoML models. The best model predicts allosteric pockets with 89.66% F1 score and 90.5% of allosteric pockets in the top 3 positions, outperforming previous results. A case study has been performed on proteins with known allosteric pockets, which shows the proof of our approach. Moreover, an effort was made to explain the pLM by visualizing its attention mechanism among allosteric and non-allosteric residues.

AVAILABILITY AND IMPLEMENTATION

The source code is available on GitHub (https://github.com/MoaazK/deepallo) and archived on Zenodo (DOI: 10.5281/zenodo.15255379). The trained model is hosted on Hugging Face (DOI: 10.57967/hf/5198). The dataset used for training and evaluation is archived on Zenodo (DOI: 10.5281/zenodo.15255437).

摘要

动机

变构是指一个位点的结合会干扰远处位点的过程，由于其对蛋白质功能有重大影响，已成为药物开发领域的一个关键焦点。变构口袋（位点）的识别是一项具有挑战性的任务，已经开发了几种技术，包括利用静态和口袋特征来预测变构口袋的机器学习技术。

结果

我们的工作DeepAllo是第一项将微调后的蛋白质语言模型（pLM）与FPocket特征相结合的研究，并且显示出变构位点的预测性能比以前的研究有所提高。pLM模型在多任务学习设置下于变构数据库（ASD）上进行了微调，并进一步用作特征提取器来训练XGBoost和自动机器学习模型。最佳模型预测变构口袋的F1分数为89.66%，并且90.5%的变构口袋在前三位，优于先前的结果。对具有已知变构口袋的蛋白质进行了案例研究，证明了我们方法的有效性。此外，还通过可视化其在变构和非变构残基之间的注意力机制来努力解释pLM。

可用性和实现方式

源代码可在GitHub（https://github.com/MoaazK/deepallo）上获取，并已存档于Zenodo（DOI：10.5281/zenodo.15255379）。训练好的模型托管在Hugging Face（DOI：10.57967/hf/5198）上。用于训练和评估的数据集已存档于Zenodo（DOI：10.5281/zenodo.15255437）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec75/12145174/9d034b64cc44/btaf294f1.jpg

相似文献

DeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning.

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf294.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

The Black Book of Psychotropic Dosing and Monitoring.

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Mental Health First Aid as a tool for improving mental health and well-being.

Cochrane Database Syst Rev. 2023 Aug 22;8(8):CD013127. doi: 10.1002/14651858.CD013127.pub2.

ToxinPred 3.0: An improved method for predicting the toxicity of peptides.

Comput Biol Med. 2024 Sep;179:108926. doi: 10.1016/j.compbiomed.2024.108926. Epub 2024 Jul 21.

引用本文的文献

STINGAllo: a web server for high-throughput prediction of allosteric site-forming residues using internal protein nanoenvironment descriptors.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf424.

Allosteric Coupling in Full-Length Lyn Kinase Revealed by Molecular Dynamics and Network Analysis.

Int J Mol Sci. 2025 Jun 18;26(12):5835. doi: 10.3390/ijms26125835.

本文引用的文献

ASD2023: towards the integrating landscapes of allosteric knowledgebase.

Nucleic Acids Res. 2024 Jan 5;52(D1):D376-D383. doi: 10.1093/nar/gkad915.

PASSerRank: Prediction of allosteric sites with learning to rank.

J Comput Chem. 2023 Oct 30;44(28):2223-2229. doi: 10.1002/jcc.27193. Epub 2023 Aug 10.

PASSer: fast and accurate prediction of protein allosteric sites.

Nucleic Acids Res. 2023 Jul 5;51(W1):W427-W431. doi: 10.1093/nar/gkad303.

PASSer2.0: Accurate Prediction of Protein Allosteric Sites Through Automated Machine Learning.

Front Mol Biosci. 2022 Jul 11;9:879251. doi: 10.3389/fmolb.2022.879251. eCollection 2022.

Wandering beyond small molecules: peptides as allosteric protein modulators.

Trends Pharmacol Sci. 2022 May;43(5):406-423. doi: 10.1016/j.tips.2021.10.011. Epub 2021 Nov 29.

PASSer: Prediction of Allosteric Sites Server.

Mach Learn Sci Technol. 2021 Sep;2(3). doi: 10.1088/2632-2153/abe6d6. Epub 2021 May 13.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

Prediction of Orthosteric and Allosteric Regulations on Cannabinoid Receptors Using Supervised Machine Learning Classifiers.

Mol Pharm. 2019 Jun 3;16(6):2605-2615. doi: 10.1021/acs.molpharmaceut.9b00182. Epub 2019 May 3.

NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning.

Proteins. 2019 Jun;87(6):520-527. doi: 10.1002/prot.25674. Epub 2019 Mar 9.

Clustering huge protein sequence sets in linear time.

Nat Commun. 2018 Jun 29;9(1):2542. doi: 10.1038/s41467-018-04964-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DeepAllo：使用具有多任务学习的蛋白质语言模型（pLM）进行变构位点预测。

DeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献