Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Sci Rep. 2021 Jun 15;11(1):12550. doi: 10.1038/s41598-021-91840-w.
Protein phosphorylation, which is one of the most important post-translational modifications (PTMs), is involved in regulating myriad cellular processes. Herein, we present a novel deep learning based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. An ensemble model combining convolutional neural networks and long short-term memory (LSTM) achieves the best performance in predicting phosphorylation sites in C. reinhardtii. Deemed Chlamy-EnPhosSite, the measured best AUC and MCC are 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing higher than those measures for other predictors. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC-MS/MS) in a blinded study and approximately 89.69% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 76.83% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.
蛋白质磷酸化是最重要的翻译后修饰(PTM)之一,参与调节无数细胞过程。在此,我们提出了一种新的基于深度学习的方法,用于预测莱茵衣藻(Chlamydomonas reinhardtii)中生物体特异性的蛋白质磷酸化位点,这是一种模型藻类光养生物。结合卷积神经网络和长短期记忆(LSTM)的集成模型在预测莱茵衣藻的磷酸化位点方面表现最佳。在独立测试中,对于丝氨酸(S)和苏氨酸(T)的组合数据集,测得的最佳 AUC 和 MCC 分别为 0.90 和 0.64,优于其他预测器的测量值。当应用于整个莱茵衣藻蛋白质组(总计 1,809,304 个 S 和 T 位点)时,Chlamy-EnPhosSite 在截止值为 0.5 时产生了 499,411 个磷酸化位点,在截止值为 0.7 时产生了 237,949 个磷酸化位点。这些预测结果与一项通过液相色谱-串联质谱(LC-MS/MS)在盲法研究中鉴定的磷酸化位点的实验数据集进行了比较,Chlamy-EnPhosSite 在概率截止值为 0.5 时成功预测了莱茵衣藻 2,663 个 S 和 T 磷酸化位点中的约 89.69%,在更严格的截止值 0.7 时成功鉴定了 76.83%的位点。有趣的是,Chlamy-EnPhosSite 还成功预测了蛋白质序列中(例如,RPS6 S245)实验证实的磷酸化位点,这些位点未出现在训练数据集中,这突出了预测准确性和利用预测来识别生物相关 PTM 位点的能力。这些结果表明,我们的方法代表了一种强大且互补的技术,可用于高通量预测莱茵衣藻中的磷酸化位点。它有潜力成为该领域的一个有用工具。Chlamy-EnPhosSite 将有助于理解蛋白质磷酸化如何影响这种重要模型微藻中的各种生物学过程。