利用可解释机器学习揭示激酶-底物相互作用全景图。

Using explainable machine learning to uncover the kinase-substrate interaction landscape.

机构信息

School of Computing, University of Georgia, Athens, GA 30602, United States.

Institute of Bioinformatics, University of Georgia, Athens, GA 30602, United States.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae033.

DOI:10.1093/bioinformatics/btae033

PMID:38244571

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10868336/

Abstract

MOTIVATION

Phosphorylation, a post-translational modification regulated by protein kinase enzymes, plays an essential role in almost all cellular processes. Understanding how each of the nearly 500 human protein kinases selectively phosphorylates their substrates is a foundational challenge in bioinformatics and cell signaling. Although deep learning models have been a popular means to predict kinase-substrate relationships, existing models often lack interpretability and are trained on datasets skewed toward a subset of well-studied kinases.

RESULTS

Here we leverage recent peptide library datasets generated to determine substrate specificity profiles of 300 serine/threonine kinases to develop an explainable Transformer model for kinase-peptide interaction prediction. The model, trained solely on primary sequences, achieved state-of-the-art performance. Its unique multitask learning paradigm built within the model enables predictions on virtually any kinase-peptide pair, including predictions on 139 kinases not used in peptide library screens. Furthermore, we employed explainable machine learning methods to elucidate the model's inner workings. Through analysis of learned embeddings at different training stages, we demonstrate that the model employs a unique strategy of substrate prediction considering both substrate motif patterns and kinase evolutionary features. SHapley Additive exPlanation (SHAP) analysis reveals key specificity determining residues in the peptide sequence. Finally, we provide a web interface for predicting kinase-substrate associations for user-defined sequences and a resource for visualizing the learned kinase-substrate associations.

AVAILABILITY AND IMPLEMENTATION

All code and data are available at https://github.com/esbgkannan/Phosformer-ST. Web server is available at https://phosformer.netlify.app.

摘要

动机

磷酸化是一种受蛋白激酶酶调控的翻译后修饰，在几乎所有细胞过程中都起着至关重要的作用。了解近 500 个人类蛋白激酶中的每一种如何选择性地磷酸化其底物，是生物信息学和细胞信号中的一个基本挑战。尽管深度学习模型一直是预测激酶-底物关系的一种流行手段，但现有的模型往往缺乏可解释性，并且是在偏向于少数研究充分的激酶的数据集上进行训练的。

结果

在这里，我们利用最近生成的肽库数据集，来确定 300 种丝氨酸/苏氨酸激酶的底物特异性特征，以开发一种可解释的 Transformer 模型，用于激酶-肽相互作用预测。该模型仅基于一级序列进行训练，达到了最先进的性能。其独特的多任务学习范式在模型内构建，使其能够对几乎任何激酶-肽对进行预测，包括对肽库筛选中未使用的 139 种激酶进行预测。此外，我们还采用了可解释的机器学习方法来阐明模型的内部工作原理。通过在不同训练阶段分析学习到的嵌入，我们证明该模型采用了一种独特的底物预测策略，同时考虑了底物模体模式和激酶进化特征。SHapley Additive exPlanation (SHAP) 分析揭示了肽序列中决定特异性的关键残基。最后，我们提供了一个用于预测用户定义序列中激酶-底物关联的网络界面，并提供了一个可视化学习到的激酶-底物关联的资源。

可用性和实现

所有代码和数据都可在 https://github.com/esbgkannan/Phosformer-ST 上获得。网络服务器可在 https://phosformer.netlify.app 上访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a207/10868336/a7ba227f5344/btae033f1.jpg

相似文献

Using explainable machine learning to uncover the kinase-substrate interaction landscape.利用可解释机器学习揭示激酶-底物相互作用全景图。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae033.

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions.Phosformer：一种可解释的用于预测蛋白激酶特异性磷酸化的转换器模型。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad046.

KinasePhos 3.0: Redesign and Expansion of the Prediction on Kinase-specific Phosphorylation Sites.KinasePhos 3.0：激酶特异性磷酸化位点预测的重新设计与扩展。

Genomics Proteomics Bioinformatics. 2023 Feb;21(1):228-241. doi: 10.1016/j.gpb.2022.06.004. Epub 2022 Jul 1.

Prediction of Kinase-Substrate Associations Using The Functional Landscape of Kinases and Phosphorylation Sites.使用激酶和磷酸化位点的功能景观预测激酶-底物关联。

Pac Symp Biocomput. 2023;28:73-84.

Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data.基于动态磷酸化蛋白质组学数据的激酶底物预测的正例-未标记样本集成学习

Bioinformatics. 2016 Jan 15;32(2):252-9. doi: 10.1093/bioinformatics/btv550. Epub 2015 Sep 22.

Use of an oriented peptide library to determine the optimal substrates of protein kinases.使用定向肽库确定蛋白激酶的最佳底物。

Curr Biol. 1994 Nov 1;4(11):973-82. doi: 10.1016/s0960-9822(00)00221-9.

PhosphoPICK: modelling cellular context to map kinase-substrate phosphorylation events.PhosphoPICK：构建细胞环境模型以映射激酶-底物磷酸化事件。

Bioinformatics. 2015 Feb 1;31(3):382-9. doi: 10.1093/bioinformatics/btu663. Epub 2014 Oct 9.

The Predikin webserver: improved prediction of protein kinase peptide specificity using structural information.Predikin网络服务器：利用结构信息改进对蛋白激酶肽特异性的预测。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W286-90. doi: 10.1093/nar/gkn279. Epub 2008 May 13.

Learned protein embeddings for machine learning.机器学习的深度学习蛋白质嵌入。

Bioinformatics. 2018 Aug 1;34(15):2642-2648. doi: 10.1093/bioinformatics/bty178.

Kinase Substrate Profiling Using a Proteome-wide Serine-Oriented Human Peptide Library.使用全蛋白质组范围的丝氨酸导向人肽库进行激酶底物分析

Biochemistry. 2018 Aug 7;57(31):4717-4725. doi: 10.1021/acs.biochem.8b00410. Epub 2018 Jun 19.

引用本文的文献

Inferring kinase-phosphosite regulation from phosphoproteome-enriched cancer multi-omics datasets.从富含磷酸化蛋白质组的癌症多组学数据集中推断激酶-磷酸化位点调控。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf143.

Kinase-substrate prediction using an autoregressive model.使用自回归模型进行激酶-底物预测。

Comput Struct Biotechnol J. 2025 Mar 8;27:1103-1111. doi: 10.1016/j.csbj.2025.03.003. eCollection 2025.

Prediction of protein interactions with function in protein (de-)phosphorylation.蛋白质（去）磷酸化过程中具有功能的蛋白质相互作用预测。

PLoS One. 2025 Mar 3;20(3):e0319084. doi: 10.1371/journal.pone.0319084. eCollection 2025.

本文引用的文献

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions.Phosformer：一种可解释的用于预测蛋白激酶特异性磷酸化的转换器模型。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad046.

An atlas of substrate specificities for the human serine/threonine kinome.人类丝氨酸/苏氨酸激酶组的底物特异性图谱

Nature. 2023 Jan;613(7945):759-766. doi: 10.1038/s41586-022-05575-3. Epub 2023 Jan 11.

EMBER: multi-label prediction of kinase-substrate phosphorylation events through deep learning.EMBER：通过深度学习进行激酶-底物磷酸化事件的多标签预测。

Bioinformatics. 2022 Apr 12;38(8):2119-2126. doi: 10.1093/bioinformatics/btac083.

PhosIDN: an integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein-protein interaction information.PhosIDN：一种集成深度学习神经网络，通过结合序列和蛋白质-蛋白质相互作用信息来提高蛋白质磷酸化位点预测。

Bioinformatics. 2021 Dec 11;37(24):4668-4676. doi: 10.1093/bioinformatics/btab551.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

Phosphorylation of Ci/Gli by Fused Family Kinases Promotes Hedgehog Signaling.融合家族激酶对 Ci/Gli 的磷酸化促进了 Hedgehog 信号通路。

Dev Cell. 2019 Sep 9;50(5):610-626.e4. doi: 10.1016/j.devcel.2019.06.008. Epub 2019 Jul 3.

DeepPhos: prediction of protein phosphorylation sites with deep learning.DeepPhos：利用深度学习预测蛋白质磷酸化位点

Bioinformatics. 2019 Aug 15;35(16):2766-2773. doi: 10.1093/bioinformatics/bty1051.

MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction.MusiteDeep：一个用于通用和激酶特异性磷酸化位点预测的深度学习框架。

Bioinformatics. 2017 Dec 15;33(24):3909-3916. doi: 10.1093/bioinformatics/btx496.

PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.磷酸化位点数据库：一个综合性资源，用于研究人和鼠中实验确定的翻译后修饰的结构和功能。

Nucleic Acids Res. 2012 Jan;40(Database issue):D261-70. doi: 10.1093/nar/gkr1122. Epub 2011 Dec 1.

A single pair of acidic residues in the kinase major groove mediates strong substrate preference for P-2 or P-5 arginine in the AGC, CAMK, and STE kinase families.激酶主沟中的一对酸性残基介导了AGC、CAMK和STE激酶家族对P-2或P-5精氨酸的强烈底物偏好。

J Biol Chem. 2005 Oct 28;280(43):36372-9. doi: 10.1074/jbc.M505031200. Epub 2005 Aug 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用可解释机器学习揭示激酶-底物相互作用全景图。

Using explainable machine learning to uncover the kinase-substrate interaction landscape.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献