National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi - 110067, India.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac249.
Even though several in silico tools are available for prediction of the phosphorylation sites for mammalian, yeast or plant proteins, currently no software is available for predicting phosphosites for Plasmodium proteins. However, the availability of significant amount of phospho-proteomics data during the last decade and advances in machine learning (ML) algorithms have opened up the opportunities for deciphering phosphorylation patterns of plasmodial system and developing ML-based phosphosite prediction tools for Plasmodium. We have developed Pf-Phospho, an ML-based method for prediction of phosphosites by training Random Forest classifiers using a large data set of 12 096 phosphosites of Plasmodium falciparum and Plasmodium bergei. Of the 12 096 known phosphosites, 75% of sites have been used for training/validation of the classifier, while remaining 25% have been used as completely unseen test data for blind testing. It is encouraging to note that Pf-Phospho can predict the kinase-independent phosphosites with 84% sensitivity, 75% specificity and 78% precision. In addition, it can also predict kinase-specific phosphosites for five plasmodial kinases-PfPKG, Plasmodium falciparum, PfPKA, PfPK7 and PbCDPK4 with high accuracy. Pf-Phospho (http://www.nii.ac.in/pfphospho.html) outperforms other widely used phosphosite prediction tools, which have been trained using mammalian phosphoproteome data. It also has been integrated with other widely used resources such as PlasmoDB, MPMP, Pfam and recently available ML-based predicted structures by AlphaFold2. Currently, Pf-phospho is the only bioinformatics resource available for ML-based prediction of phospho-signaling networks of Plasmodium and is a user-friendly platform for integrative analysis of phospho-signaling along with metabolic and protein-protein interaction networks.
尽管有几种用于预测哺乳动物、酵母或植物蛋白质磷酸化位点的计算工具,但目前尚无用于预测疟原蛋白磷酸化位点的软件。然而,在过去十年中,大量磷酸化蛋白质组学数据的出现以及机器学习 (ML) 算法的进步,为破译疟原虫系统的磷酸化模式和开发基于 ML 的疟原虫磷酸化位点预测工具提供了机会。我们开发了 Pf-Phospho,这是一种基于 ML 的方法,通过使用来自恶性疟原虫和伯氏疟原虫的 12096 个磷酸化位点的大型数据集来训练随机森林分类器来预测磷酸化位点。在已知的 12096 个磷酸化位点中,有 75%的位点用于训练/验证分类器,而其余 25%的位点则作为完全未见过的测试数据用于盲测。令人鼓舞的是,Pf-Phospho 可以预测激酶非依赖性磷酸化位点,灵敏度为 84%,特异性为 75%,精度为 78%。此外,它还可以为五种疟原虫激酶(PfPKG、恶性疟原虫、PfPKA、PfPK7 和 PbCDPK4)预测激酶特异性磷酸化位点,具有很高的准确性。Pf-Phospho(http://www.nii.ac.in/pfphospho.html)优于其他使用哺乳动物磷酸蛋白质组数据训练的广泛使用的磷酸化位点预测工具。它还与其他广泛使用的资源(如 PlasmoDB、MPMP、Pfam 和最近由 AlphaFold2 提供的基于 ML 的预测结构)集成在一起。目前,Pf-Phospho 是唯一可用于基于 ML 的疟原虫磷酸化信号网络预测的生物信息学资源,是一个用于整合分析磷酸化信号以及代谢和蛋白质-蛋白质相互作用网络的用户友好平台。