Phosformer：一种可解释的用于预测蛋白激酶特异性磷酸化的转换器模型。

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions.

机构信息

School of Computing, University of Georgia, GA 30602, USA.

Institute of Bioinformatics, University of Georgia, GA 30602, USA.

出版信息

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad046.

DOI:10.1093/bioinformatics/btad046

PMID:36692152

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9900213/

Abstract

MOTIVATION

The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase-substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level.

RESULTS

We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework.

AVAILABILITY AND IMPLEMENTATION

Code and data are available at https://github.com/esbgkannan/phosformer.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

人类基因组编码了超过 500 种独特的蛋白激酶，通过蛋白质底物的特异性磷酸化来调节几乎所有的细胞过程。尽管质谱和蛋白质组学研究的进展已经在不同物种中鉴定出了数千个磷酸化位点，但目前绝大多数磷酸化位点的具体激酶信息仍然缺乏。最近，人们主要关注开发用于预测激酶-底物关联的计算模型。然而，大多数现有的模型只允许对一组研究充分的激酶进行预测。此外，在训练和测试数据集的不平衡以及手工制作的特征的使用方面存在独特的挑战，这给激酶特异性磷酸化预测的准确预测模型的开发带来了独特的挑战。受最近开发的通用蛋白质语言模型的启发，这些模型可以从原始序列信息中自动生成上下文感知特征，我们试图开发一种用于激酶特异性磷酸化位点预测的统一框架，允许更大的研究效用，并能够在整个激酶组水平上进行底物预测。

结果

我们提出了一种用于激酶特异性磷酸化位点预测的深度学习模型，称为 Phosformer，它可以根据任意一对未对齐的激酶和底物肽序列预测磷酸化的概率。我们证明，Phosformer 在训练过程中隐式地学习进化和功能特征，从而无需进行特征提取和工程设计。进一步的分析表明，Phosformer 还学习了底物特异性基序，并能够区分功能不同的激酶家族。基准测试表明，与最先进的模型相比，Phosformer 有显著的改进，同时也提出了一个更通用、统一和可解释的预测框架。

可用性和实现

代码和数据可在 https://github.com/esbgkannan/phosformer 上获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d020/9900213/05156c82c793/btad046f1.jpg

相似文献

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions.Phosformer：一种可解释的用于预测蛋白激酶特异性磷酸化的转换器模型。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad046.

Using explainable machine learning to uncover the kinase-substrate interaction landscape.利用可解释机器学习揭示激酶-底物相互作用全景图。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae033.

Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data.基于动态磷酸化蛋白质组学数据的激酶底物预测的正例-未标记样本集成学习

Bioinformatics. 2016 Jan 15;32(2):252-9. doi: 10.1093/bioinformatics/btv550. Epub 2015 Sep 22.

Prediction of Kinase-Substrate Associations Using The Functional Landscape of Kinases and Phosphorylation Sites.使用激酶和磷酸化位点的功能景观预测激酶-底物关联。

Pac Symp Biocomput. 2023;28:73-84.

KSFinder-a knowledge graph model for link prediction of novel phosphorylated substrates of kinases.KSFinder——一种用于激酶新磷酸化底物链接预测的知识图谱模型。

PeerJ. 2023 Oct 6;11:e16164. doi: 10.7717/peerj.16164. eCollection 2023.

DeepKinZero: zero-shot learning for predicting kinase-phosphosite associations involving understudied kinases.DeepKinZero：零样本学习预测涉及研究较少的激酶的激酶-磷酸化位点关联。

Bioinformatics. 2020 Jun 1;36(12):3652-3661. doi: 10.1093/bioinformatics/btaa013.

PhosphoPICK: modelling cellular context to map kinase-substrate phosphorylation events.PhosphoPICK：构建细胞环境模型以映射激酶-底物磷酸化事件。

Bioinformatics. 2015 Feb 1;31(3):382-9. doi: 10.1093/bioinformatics/btu663. Epub 2014 Oct 9.

MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction.MusiteDeep：一个用于通用和激酶特异性磷酸化位点预测的深度学习框架。

Bioinformatics. 2017 Dec 15;33(24):3909-3916. doi: 10.1093/bioinformatics/btx496.

PhosIDN: an integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein-protein interaction information.PhosIDN：一种集成深度学习神经网络，通过结合序列和蛋白质-蛋白质相互作用信息来提高蛋白质磷酸化位点预测。

Bioinformatics. 2021 Dec 11;37(24):4668-4676. doi: 10.1093/bioinformatics/btab551.

DeepPhos: prediction of protein phosphorylation sites with deep learning.DeepPhos：利用深度学习预测蛋白质磷酸化位点

Bioinformatics. 2019 Aug 15;35(16):2766-2773. doi: 10.1093/bioinformatics/bty1051.

引用本文的文献

Cysteinyl leukotrienes stimulate gut absorption of food allergens to promote anaphylaxis in mice.半胱氨酰白三烯刺激肠道对食物过敏原的吸收，以促进小鼠的过敏反应。

Science. 2025 Aug 7;389(6760):eadp0240. doi: 10.1126/science.adp0240.

Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins.基于大语言模型（LLM）在蛋白质翻译后修饰位点预测方面的进展。

Methods Mol Biol. 2025;2941:313-355. doi: 10.1007/978-1-0716-4623-6_19.

Inference of differential kinase interaction networks with KINference.利用KINference推断差异激酶相互作用网络。

Bioinformatics. 2025 Jun 20. doi: 10.1093/bioinformatics/btaf349.

Comprehensive evaluation of phosphoproteomic-based kinase activity inference.基于磷酸化蛋白质组学的激酶活性推断的综合评估。

Nat Commun. 2025 May 22;16(1):4771. doi: 10.1038/s41467-025-59779-y.

Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions.用于蛋白质 - 核酸相互作用建模的大规模多组学生物序列变换器

ArXiv. 2025 Apr 1:arXiv:2408.16245v3.

Kinase-substrate prediction using an autoregressive model.使用自回归模型进行激酶-底物预测。

Comput Struct Biotechnol J. 2025 Mar 8;27:1103-1111. doi: 10.1016/j.csbj.2025.03.003. eCollection 2025.

GPS-pPLM: A Language Model for Prediction of Prokaryotic Phosphorylation Sites.GPS-pPLM：一种用于预测原核磷酸化位点的语言模型。

Cells. 2024 Nov 8;13(22):1854. doi: 10.3390/cells13221854.

Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network.基于多相似度融合和卷积神经网络负样本选择的蛋白质磷酸化位点疾病关联识别。

Interdiscip Sci. 2024 Sep;16(3):649-664. doi: 10.1007/s12539-024-00615-0. Epub 2024 Mar 8.

Informatic challenges and advances in illuminating the druggable proteome.阐明可成药蛋白质组中的信息学挑战与进展。

Drug Discov Today. 2024 Mar;29(3):103894. doi: 10.1016/j.drudis.2024.103894. Epub 2024 Jan 22.

Using explainable machine learning to uncover the kinase-substrate interaction landscape.利用可解释机器学习揭示激酶-底物相互作用全景图。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae033.

本文引用的文献

Tree visualizations of protein sequence embedding space enable improved functional clustering of diverse protein superfamilies.蛋白质序列嵌入空间的树状图可视化可提高不同蛋白质超家族功能聚类的效果。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac619.

An atlas of substrate specificities for the human serine/threonine kinome.人类丝氨酸/苏氨酸激酶组的底物特异性图谱

Nature. 2023 Jan;613(7945):759-766. doi: 10.1038/s41586-022-05575-3. Epub 2023 Jan 11.

Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings.基于蛋白质序列嵌入的无比对序列保守性估计用于识别功能位点。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac599.

EMBER: multi-label prediction of kinase-substrate phosphorylation events through deep learning.EMBER：通过深度学习进行激酶-底物磷酸化事件的多标签预测。

Bioinformatics. 2022 Apr 12;38(8):2119-2126. doi: 10.1093/bioinformatics/btac083.

Mapping the glycosyltransferase fold landscape using interpretable deep learning.使用可解释的深度学习绘制糖基转移酶折叠图谱。

Nat Commun. 2021 Sep 27;12(1):5656. doi: 10.1038/s41467-021-25975-9.

Bioinformatics. 2021 Dec 11;37(24):4668-4676. doi: 10.1093/bioinformatics/btab551.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans：通过自监督学习理解生命语言。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

KEA3: improved kinase enrichment analysis via data integration.KEA3：通过数据集成改进激酶富集分析。

Nucleic Acids Res. 2021 Jul 2;49(W1):W304-W316. doi: 10.1093/nar/gkab359.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

Posttranslational modifications in proteins: resources, tools and prediction methods.蛋白质的翻译后修饰：资源、工具和预测方法。

Database (Oxford). 2021 Apr 7;2021. doi: 10.1093/database/baab012.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Phosformer：一种可解释的用于预测蛋白激酶特异性磷酸化的转换器模型。

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献