Suppr超能文献

PPSNO:一种基于蛋白质序列衍生信息的堆叠集成策略的富含 SNO 位点的预测器。

PPSNO: A Feature-Rich SNO Sites Predictor by Stacking Ensemble Strategy from Protein Sequence-Derived Information.

机构信息

School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China.

College of Economics and Management, Nanjing Forestry University, Nanjing, 210037, China.

出版信息

Interdiscip Sci. 2024 Mar;16(1):192-217. doi: 10.1007/s12539-023-00595-7. Epub 2024 Jan 11.

Abstract

The protein S-nitrosylation (SNO) is a significant post-translational modification that affects the stability, activity, cellular localization, and function of proteins. Therefore, highly accurate prediction of SNO sites aids in grasping biological function mechanisms. In this document, we have constructed a predictor, named PPSNO, forecasting protein SNO sites using stacked integrated learning. PPSNO integrates multiple machine learning techniques into an ensemble model, enhancing its predictive accuracy. First, we established benchmark datasets by collecting SNO sites from various sources, including literature, databases, and other predictors. Second, various techniques for feature extraction are applied to derive characteristics from protein sequences, which are subsequently amalgamated into the PPSNO predictor for training. Five-fold cross-validation experiments show that PPSNO outperformed existing predictors, such as PSNO, PreSNO, pCysMod, DeepNitro, RecSNO, and Mul-SNO. The PPSNO predictor achieved an impressive accuracy of 92.8%, an area under the curve (AUC) of 96.1%, a Matthews correlation coefficient (MCC) of 81.3%, an F1-score of 85.6%, an SN of 79.3%, an SP of 97.7%, and an average precision (AP) of 92.2%. We also employed ROC curves, PR curves, and radar plots to show the superior performance of PPSNO. Our study shows that fused protein sequence features and two-layer stacked ensemble models can improve the accuracy of predicting SNO sites, which can aid in comprehending cellular processes and disease mechanisms. The codes and data are available at https://github.com/serendipity-wly/PPSNO .

摘要

蛋白质 S-亚硝基化(SNO)是一种重要的翻译后修饰,影响蛋白质的稳定性、活性、细胞定位和功能。因此,高度准确地预测 SNO 位点有助于理解生物功能机制。在本文中,我们构建了一个预测器,命名为 PPSNO,使用堆叠集成学习预测蛋白质 SNO 位点。PPSNO 将多种机器学习技术集成到一个集成模型中,提高了其预测准确性。首先,我们通过从文献、数据库和其他预测器等各种来源收集 SNO 位点来建立基准数据集。其次,应用各种特征提取技术从蛋白质序列中提取特征,然后将这些特征合并到 PPSNO 预测器中进行训练。五重交叉验证实验表明,PPSNO 优于 PSNO、PreSNO、pCysMod、DeepNitro、RecSNO 和 Mul-SNO 等现有预测器。PPSNO 预测器的准确率高达 92.8%,曲线下面积(AUC)为 96.1%,马修斯相关系数(MCC)为 81.3%,F1 分数为 85.6%,SN 为 79.3%,SP 为 97.7%,平均精度(AP)为 92.2%。我们还使用 ROC 曲线、PR 曲线和雷达图显示了 PPSNO 的优越性能。我们的研究表明,融合蛋白质序列特征和两层堆叠集成模型可以提高预测 SNO 位点的准确性,这有助于理解细胞过程和疾病机制。代码和数据可在 https://github.com/serendipity-wly/PPSNO 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验