College of Chemistry, Sichuan University, Chengdu 610064, China.
Int J Mol Sci. 2022 Feb 3;23(3):1741. doi: 10.3390/ijms23031741.
As one of the most important post-translational modifications (PTMs), phosphorylation refers to the binding of a phosphate group with amino acid residues like Ser (S), Thr (T) and Tyr (Y) thus resulting in diverse functions at the molecular level. Abnormal phosphorylation has been proved to be closely related with human diseases. To our knowledge, no research has been reported describing specific disease-associated phosphorylation sites prediction which is of great significance for comprehensive understanding of disease mechanism. In this work, focusing on three types of leukemia, we aim to develop a reliable leukemia-related phosphorylation site prediction models by combing deep convolutional neural network (CNN) with transfer-learning. CNN could automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of leukemia-related phosphorylation site prediction. With the largest dataset of myelogenous leukemia, the optimal models for S/T/Y phosphorylation sites give the AUC values of 0.8784, 0.8328 and 0.7716 respectively. When transferred learning on the small size datasets, the models for T-cell and lymphoid leukemia also give the promising performance by common sharing the optimal parameters. Compared with other five machine-learning methods, our CNN models reveal the superior performance. Finally, the leukemia-related pathogenesis analysis and distribution analysis on phosphorylated proteins along with K-means clustering analysis and position-specific conversation profiles on the phosphorylation site all indicate the strong practical feasibility of our easy-to-use CNN models.
作为最重要的翻译后修饰(PTM)之一,磷酸化是指磷酸基团与丝氨酸(S)、苏氨酸(T)和酪氨酸(Y)等氨基酸残基的结合,从而在分子水平上产生多种功能。已经证明异常磷酸化与人类疾病密切相关。据我们所知,目前还没有研究报道描述特定的与疾病相关的磷酸化位点预测,这对于全面了解疾病机制具有重要意义。在这项工作中,我们专注于三种白血病,旨在通过将深度卷积神经网络(CNN)与迁移学习相结合,开发一种可靠的白血病相关磷酸化位点预测模型。CNN 可以从原始序列中自动发现磷酸化模式的复杂表示,因此它为提高白血病相关磷酸化位点预测提供了有力的工具。使用最大的髓样白血病数据集,S/T/Y 磷酸化位点的最优模型的 AUC 值分别为 0.8784、0.8328 和 0.7716。当在小数据集上进行迁移学习时,通过共享最优参数,T 细胞和淋巴样白血病的模型也表现出了有前景的性能。与其他五种机器学习方法相比,我们的 CNN 模型显示出了优越的性能。最后,白血病相关发病机制分析和磷酸化蛋白沿 K-means 聚类分析和磷酸化位点的位置特异性转换谱的分布分析都表明了我们易于使用的 CNN 模型具有很强的实际可行性。