Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Joint Department of Biomedical Engineering, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Bioinformatics. 2022 Apr 12;38(8):2119-2126. doi: 10.1093/bioinformatics/btac083.
Kinase-catalyzed phosphorylation of proteins forms the backbone of signal transduction within the cell, enabling the coordination of numerous processes such as the cell cycle, apoptosis, and differentiation. Although on the order of 105 phosphorylation events have been described, we know the specific kinase performing these functions for <5% of cases. The ability to predict which kinases initiate specific individual phosphorylation events has the potential to greatly enhance the design of downstream experimental studies, while simultaneously creating a preliminary map of the broader phosphorylation network that controls cellular signaling.
We describe Embedding-based multi-label prediction of phosphorylation events (EMBER), a deep learning method that integrates kinase phylogenetic information and motif-dissimilarity information into a multi-label classification model for the prediction of kinase-motif phosphorylation events. Unlike previous deep learning methods that perform single-label classification, we restate the task of kinase-motif phosphorylation prediction as a multi-label problem, allowing us to train a single unified model rather than a separate model for each of the 134 kinase families. We utilize a Siamese neural network to generate novel vector representations, or an embedding, of peptide motif sequences, and we compare our novel embedding to a previously proposed peptide embedding. Our motif vector representations are used, along with one-hot encoded motif sequences, as input to a classification neural network while also leveraging kinase phylogenetic relationships into our model via a kinase phylogeny-weighted loss function. Results suggest that this approach holds significant promise for improving the known map of phosphorylation relationships that underlie kinome signaling.
The data and code underlying this article are available in a GitHub repository at https://github.com/gomezlab/EMBER.
Supplementary data are available at Bioinformatics online.
蛋白激酶催化的磷酸化形成了细胞内信号转导的基础,使众多过程如细胞周期、细胞凋亡和分化得以协调。尽管已经描述了大约 105 个磷酸化事件,但我们只知道其中 <5%的特定激酶能够执行这些功能。预测哪些激酶启动特定的磷酸化事件的能力有可能极大地增强下游实验研究的设计,同时创建控制细胞信号的更广泛磷酸化网络的初步图谱。
我们描述了基于嵌入的磷酸化事件多标签预测(EMBER),这是一种深度学习方法,它将激酶系统发育信息和基序不相似性信息集成到一个多标签分类模型中,用于预测激酶-基序磷酸化事件。与之前执行单标签分类的深度学习方法不同,我们将激酶-基序磷酸化预测的任务重新表述为多标签问题,使我们能够训练一个单一的统一模型,而不是为 134 种激酶家族中的每一种分别训练一个模型。我们利用孪生神经网络生成肽基序序列的新向量表示或嵌入,我们将我们的新嵌入与之前提出的肽嵌入进行比较。我们的基序向量表示与独热编码的基序序列一起作为分类神经网络的输入,同时通过激酶系统发育加权损失函数将激酶系统发育关系纳入我们的模型。结果表明,这种方法有望改善激酶信号通路中已知的磷酸化关系图谱。
本文所依据的数据和代码可在 https://github.com/gomezlab/EMBER 的 GitHub 存储库中获得。
补充数据可在 Bioinformatics 在线获取。