Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China.
Department of Chemical Biology, Key Laboratory for Chemical Biology of Fujian Province, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China.
Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae032.
Protein N-phosphorylation is widely present in nature and participates in various biological processes. However, current knowledge on N-phosphorylation is extremely limited compared to that on O-phosphorylation. In this study, we collected 11,710 experimentally verified N-phosphosites of 7344 proteins from 39 species and subsequently constructed the database Nphos to share up-to-date information on protein N-phosphorylation. Upon these substantial data, we characterized the sequential and structural features of protein N-phosphorylation. Moreover, after comparing hundreds of learning models, we chose and optimized gradient boosting decision tree (GBDT) models to predict three types of human N-phosphorylation, achieving mean area under the receiver operating characteristic curve (AUC) values of 90.56%, 91.24%, and 92.01% for pHis, pLys, and pArg, respectively. Meanwhile, we discovered 488,825 distinct N-phosphosites in the human proteome. The models were also deployed in Nphos for interactive N-phosphosite prediction. In summary, this work provides new insights and points for both flexible and focused investigations of N-phosphorylation. It will also facilitate a deeper and more systematic understanding of protein N-phosphorylation modification by providing a data and technical foundation. Nphos is freely available at http://www.bio-add.org/Nphos/ and http://ppodd.org.cn/Nphos/.
蛋白质 N 磷酸化广泛存在于自然界中,参与各种生物过程。然而,与 O 磷酸化相比,目前对 N 磷酸化的了解极其有限。在这项研究中,我们从 39 个物种中收集了 11710 个经过实验验证的 7344 种蛋白质的 N 磷酸化位点,并随后构建了数据库 Nphos,以分享最新的蛋白质 N 磷酸化信息。基于这些大量数据,我们描述了蛋白质 N 磷酸化的序列和结构特征。此外,在比较了数百个学习模型之后,我们选择并优化了梯度提升决策树 (GBDT) 模型来预测三种类型的人类 N 磷酸化,分别实现了 pHis、pLys 和 pArg 的接收器操作特性曲线 (AUC) 值的平均值为 90.56%、91.24%和 92.01%。同时,我们在人类蛋白质组中发现了 488825 个独特的 N 磷酸化位点。这些模型也被部署在 Nphos 中进行交互式 N 磷酸化位点预测。总之,这项工作为灵活和有针对性地研究 N 磷酸化提供了新的见解和要点。它还将通过提供数据和技术基础,促进对蛋白质 N 磷酸化修饰的更深入和更系统的理解。Nphos 可免费在 http://www.bio-add.org/Nphos/ 和 http://ppodd.org.cn/Nphos/ 获得。