School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.
Anal Biochem. 2021 Jan 1;612:113955. doi: 10.1016/j.ab.2020.113955. Epub 2020 Sep 16.
Phosphorylation is a ubiquitous type of post-translational modification (PTM) that occurs in both eukaryotic and prokaryotic cells where in a phosphate group binds with amino acid residues. These specific residues, i.e., serine (S), threonine (T), and tyrosine (Y), exhibit diverse functions at the molecular level. Recent studies have determined that some diseases such as cancer, diabetes, and neurodegenerative diseases are caused by abnormal phosphorylation. Based on its potential applications in biological research and drug development, the large-scale identification of phosphorylation sites has attracted interest. Existing wet-lab technologies for targeting phosphorylation sites are overpriced and time consuming. Thus, computational algorithms that can efficiently accelerate the annotation of phosphorylation sites from massive protein sequences are needed. Numerous machine learning-based methods have been implemented for phosphorylation sites prediction. However, despite extensive efforts, existing computational approaches continue to have inadequate performance, particularly in terms of overall ACC, MCC, and AUC. In this paper, we report a novel deep learning-based predictor to overcome these performance hurdles, DeepPPSite, which was constructed using a stacked long short-term memory recurrent network for predicting phosphorylation sites. The proposed technique expediently learns the protein representations from conjoint protein descriptors. The experimental results indicated that our model achieved superior performance on the training dataset for S, T and Y, with MCC values of 0.608, 0.602, and 0.558, respectively, using a 10-fold cross-validation test. We further determined the generalization efficacy of the proposed predictor DeepPPSite by conducting a rigorous independent test. The predictive MCC values were 0.358, 0.356, and 0.350 for the S, T, and Y phosphorylation sites, respectively. Rigorous cross-validation and independent validation tests for the three types of phosphorylation sites demonstrated that the designed DeepPPSite tool significantly outperforms state-of-the-art methods.
磷酸化是一种普遍存在的翻译后修饰(PTM)类型,发生在真核和原核细胞中,其中磷酸基团与氨基酸残基结合。这些特定的残基,即丝氨酸(S)、苏氨酸(T)和酪氨酸(Y),在分子水平上具有多种功能。最近的研究表明,一些疾病,如癌症、糖尿病和神经退行性疾病,是由异常磷酸化引起的。基于其在生物研究和药物开发中的潜在应用,大规模鉴定磷酸化位点引起了人们的兴趣。现有的针对磷酸化位点的湿实验室技术价格昂贵且耗时。因此,需要能够有效地加速从大量蛋白质序列中注释磷酸化位点的计算算法。已经实施了许多基于机器学习的方法来预测磷酸化位点。然而,尽管付出了广泛的努力,现有的计算方法在整体 ACC、MCC 和 AUC 方面仍然表现不佳。在本文中,我们报告了一种新的基于深度学习的预测器,即 DeepPPSite,用于克服这些性能障碍,该预测器使用堆叠长短期记忆递归网络构建,用于预测磷酸化位点。该技术方便地从联合蛋白质描述符中学习蛋白质表示。实验结果表明,我们的模型在 S、T 和 Y 的训练数据集上取得了卓越的性能,使用 10 折交叉验证测试,MCC 值分别为 0.608、0.602 和 0.558。我们通过进行严格的独立测试进一步确定了所提出的预测器 DeepPPSite 的泛化效果。对于 S、T 和 Y 磷酸化位点,预测的 MCC 值分别为 0.358、0.356 和 0.350。对于三种类型的磷酸化位点的严格交叉验证和独立验证测试表明,所设计的 DeepPPSite 工具明显优于最先进的方法。