PPSNO：一种基于蛋白质序列衍生信息的堆叠集成策略的富含 SNO 位点的预测器。

PPSNO: A Feature-Rich SNO Sites Predictor by Stacking Ensemble Strategy from Protein Sequence-Derived Information.

机构信息

School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China.

College of Economics and Management, Nanjing Forestry University, Nanjing, 210037, China.

出版信息

Interdiscip Sci. 2024 Mar;16(1):192-217. doi: 10.1007/s12539-023-00595-7. Epub 2024 Jan 11.

DOI:10.1007/s12539-023-00595-7

PMID:38206557

Abstract

The protein S-nitrosylation (SNO) is a significant post-translational modification that affects the stability, activity, cellular localization, and function of proteins. Therefore, highly accurate prediction of SNO sites aids in grasping biological function mechanisms. In this document, we have constructed a predictor, named PPSNO, forecasting protein SNO sites using stacked integrated learning. PPSNO integrates multiple machine learning techniques into an ensemble model, enhancing its predictive accuracy. First, we established benchmark datasets by collecting SNO sites from various sources, including literature, databases, and other predictors. Second, various techniques for feature extraction are applied to derive characteristics from protein sequences, which are subsequently amalgamated into the PPSNO predictor for training. Five-fold cross-validation experiments show that PPSNO outperformed existing predictors, such as PSNO, PreSNO, pCysMod, DeepNitro, RecSNO, and Mul-SNO. The PPSNO predictor achieved an impressive accuracy of 92.8%, an area under the curve (AUC) of 96.1%, a Matthews correlation coefficient (MCC) of 81.3%, an F1-score of 85.6%, an SN of 79.3%, an SP of 97.7%, and an average precision (AP) of 92.2%. We also employed ROC curves, PR curves, and radar plots to show the superior performance of PPSNO. Our study shows that fused protein sequence features and two-layer stacked ensemble models can improve the accuracy of predicting SNO sites, which can aid in comprehending cellular processes and disease mechanisms. The codes and data are available at https://github.com/serendipity-wly/PPSNO .

摘要

蛋白质 S-亚硝基化（SNO）是一种重要的翻译后修饰，影响蛋白质的稳定性、活性、细胞定位和功能。因此，高度准确地预测 SNO 位点有助于理解生物功能机制。在本文中，我们构建了一个预测器，命名为 PPSNO，使用堆叠集成学习预测蛋白质 SNO 位点。PPSNO 将多种机器学习技术集成到一个集成模型中，提高了其预测准确性。首先，我们通过从文献、数据库和其他预测器等各种来源收集 SNO 位点来建立基准数据集。其次，应用各种特征提取技术从蛋白质序列中提取特征，然后将这些特征合并到 PPSNO 预测器中进行训练。五重交叉验证实验表明，PPSNO 优于 PSNO、PreSNO、pCysMod、DeepNitro、RecSNO 和 Mul-SNO 等现有预测器。PPSNO 预测器的准确率高达 92.8%，曲线下面积（AUC）为 96.1%，马修斯相关系数（MCC）为 81.3%，F1 分数为 85.6%，SN 为 79.3%，SP 为 97.7%，平均精度（AP）为 92.2%。我们还使用 ROC 曲线、PR 曲线和雷达图显示了 PPSNO 的优越性能。我们的研究表明，融合蛋白质序列特征和两层堆叠集成模型可以提高预测 SNO 位点的准确性，这有助于理解细胞过程和疾病机制。代码和数据可在 https://github.com/serendipity-wly/PPSNO 上获得。

相似文献

PPSNO: A Feature-Rich SNO Sites Predictor by Stacking Ensemble Strategy from Protein Sequence-Derived Information.

Interdiscip Sci. 2024 Mar;16(1):192-217. doi: 10.1007/s12539-023-00595-7. Epub 2024 Jan 11.

pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model.

BMC Bioinformatics. 2023 Feb 8;24(1):41. doi: 10.1186/s12859-023-05164-9.

SNO-DCA: A model for predicting -nitrosylation sites based on densely connected convolutional networks and attention mechanism.

Heliyon. 2023 Dec 3;10(1):e23187. doi: 10.1016/j.heliyon.2023.e23187. eCollection 2024 Jan 15.

PSNO: predicting cysteine S-nitrosylation sites by incorporating various sequence-derived features into the general form of Chou's PseAAC.

Int J Mol Sci. 2014 Jun 25;15(7):11204-19. doi: 10.3390/ijms150711204.

Prediction of S-nitrosylation sites by integrating support vector machines and random forest.

Mol Omics. 2019 Dec 2;15(6):451-458. doi: 10.1039/c9mo00098d.

An efficient support vector machine approach for identifying protein S-nitrosylation sites.

Protein Pept Lett. 2011 Jun;18(6):573-87. doi: 10.2174/092986611795222731.

Identification of S-nitrosylation sites based on multiple features combination.

Sci Rep. 2019 Feb 28;9(1):3098. doi: 10.1038/s41598-019-39743-9.

iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition.

PLoS One. 2013;8(2):e55844. doi: 10.1371/journal.pone.0055844. Epub 2013 Feb 7.

DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning.

Genomics Proteomics Bioinformatics. 2018 Aug;16(4):294-306. doi: 10.1016/j.gpb.2018.04.007. Epub 2018 Sep 27.

DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder.

Protein Pept Lett. 2021;28(6):708-721. doi: 10.2174/0929866527666201202103411.

本文引用的文献

Identification of adaptor proteins by incorporating deep learning and PSSM profiles.

Methods. 2023 Jan;209:10-17. doi: 10.1016/j.ymeth.2022.11.001. Epub 2022 Nov 22.

XGBoost-Based Feature Learning Method for Mining COVID-19 Novel Diagnostic Markers.

Front Public Health. 2022 Jun 22;10:926069. doi: 10.3389/fpubh.2022.926069. eCollection 2022.

AMMU: A survey of transformer-based biomedical pretrained language models.

J Biomed Inform. 2022 Feb;126:103982. doi: 10.1016/j.jbi.2021.103982. Epub 2021 Dec 31.

Predicting S-nitrosylation proteins and sites by fusing multiple features.

Math Biosci Eng. 2021 Oct 25;18(6):9132-9147. doi: 10.3934/mbe.2021450.

Mul-SNO: A Novel Prediction Tool for S-Nitrosylation Sites Based on Deep Learning Methods.

IEEE J Biomed Health Inform. 2022 May;26(5):2379-2387. doi: 10.1109/JBHI.2021.3123503. Epub 2022 May 5.

pCysMod: Prediction of Multiple Cysteine Modifications Based on Deep Learning Framework.

Front Cell Dev Biol. 2021 Feb 23;9:617366. doi: 10.3389/fcell.2021.617366. eCollection 2021.

Protein-protein interaction site prediction using random forest proximity distance.

J Bioinform Comput Biol. 2021 Feb;19(1):2050042. doi: 10.1142/S0219720020500420. Epub 2020 Nov 19.

CatBoost for big data: an interdisciplinary review.

J Big Data. 2020;7(1):94. doi: 10.1186/s40537-020-00369-8. Epub 2020 Nov 4.

Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation.

Front Physiol. 2019 Dec 10;10:1501. doi: 10.3389/fphys.2019.01501. eCollection 2019.

Prediction of S-nitrosylation sites by integrating support vector machines and random forest.

Mol Omics. 2019 Dec 2;15(6):451-458. doi: 10.1039/c9mo00098d.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PPSNO：一种基于蛋白质序列衍生信息的堆叠集成策略的富含 SNO 位点的预测器。

PPSNO: A Feature-Rich SNO Sites Predictor by Stacking Ensemble Strategy from Protein Sequence-Derived Information.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献