Suppr超能文献

利用可解释机器学习揭示激酶-底物相互作用全景图。

Using explainable machine learning to uncover the kinase-substrate interaction landscape.

机构信息

School of Computing, University of Georgia, Athens, GA 30602, United States.

Institute of Bioinformatics, University of Georgia, Athens, GA 30602, United States.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae033.

Abstract

MOTIVATION

Phosphorylation, a post-translational modification regulated by protein kinase enzymes, plays an essential role in almost all cellular processes. Understanding how each of the nearly 500 human protein kinases selectively phosphorylates their substrates is a foundational challenge in bioinformatics and cell signaling. Although deep learning models have been a popular means to predict kinase-substrate relationships, existing models often lack interpretability and are trained on datasets skewed toward a subset of well-studied kinases.

RESULTS

Here we leverage recent peptide library datasets generated to determine substrate specificity profiles of 300 serine/threonine kinases to develop an explainable Transformer model for kinase-peptide interaction prediction. The model, trained solely on primary sequences, achieved state-of-the-art performance. Its unique multitask learning paradigm built within the model enables predictions on virtually any kinase-peptide pair, including predictions on 139 kinases not used in peptide library screens. Furthermore, we employed explainable machine learning methods to elucidate the model's inner workings. Through analysis of learned embeddings at different training stages, we demonstrate that the model employs a unique strategy of substrate prediction considering both substrate motif patterns and kinase evolutionary features. SHapley Additive exPlanation (SHAP) analysis reveals key specificity determining residues in the peptide sequence. Finally, we provide a web interface for predicting kinase-substrate associations for user-defined sequences and a resource for visualizing the learned kinase-substrate associations.

AVAILABILITY AND IMPLEMENTATION

All code and data are available at https://github.com/esbgkannan/Phosformer-ST. Web server is available at https://phosformer.netlify.app.

摘要

动机

磷酸化是一种受蛋白激酶酶调控的翻译后修饰,在几乎所有细胞过程中都起着至关重要的作用。了解近 500 个人类蛋白激酶中的每一种如何选择性地磷酸化其底物,是生物信息学和细胞信号中的一个基本挑战。尽管深度学习模型一直是预测激酶-底物关系的一种流行手段,但现有的模型往往缺乏可解释性,并且是在偏向于少数研究充分的激酶的数据集上进行训练的。

结果

在这里,我们利用最近生成的肽库数据集,来确定 300 种丝氨酸/苏氨酸激酶的底物特异性特征,以开发一种可解释的 Transformer 模型,用于激酶-肽相互作用预测。该模型仅基于一级序列进行训练,达到了最先进的性能。其独特的多任务学习范式在模型内构建,使其能够对几乎任何激酶-肽对进行预测,包括对肽库筛选中未使用的 139 种激酶进行预测。此外,我们还采用了可解释的机器学习方法来阐明模型的内部工作原理。通过在不同训练阶段分析学习到的嵌入,我们证明该模型采用了一种独特的底物预测策略,同时考虑了底物模体模式和激酶进化特征。SHapley Additive exPlanation (SHAP) 分析揭示了肽序列中决定特异性的关键残基。最后,我们提供了一个用于预测用户定义序列中激酶-底物关联的网络界面,并提供了一个可视化学习到的激酶-底物关联的资源。

可用性和实现

所有代码和数据都可在 https://github.com/esbgkannan/Phosformer-ST 上获得。网络服务器可在 https://phosformer.netlify.app 上访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a207/10868336/a7ba227f5344/btae033f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验