Suppr超能文献

DeepSSPred:一种基于深度学习的新型 nSegmented Optimize 联邦特征编码器的硫化位点预测器。

DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder.

机构信息

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.

出版信息

Protein Pept Lett. 2021;28(6):708-721. doi: 10.2174/0929866527666201202103411.

Abstract

BACKGROUND

S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine.

OBJECTIVE

In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites.

METHODS

In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via nSegmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2D-Convolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication.

RESULTS

Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies.

CONCLUSION

In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.

摘要

背景

S-亚磺酰化(S-磺酰化,或亚磺酸)蛋白质是一种特殊的翻译后修饰,在细胞因子信号转导、转录调控和细胞凋亡等各种生理和病理过程中发挥着重要作用。尽管具有上述重要意义,但为了补充现有的湿实验方法,已经开发了几种计算模型来预测亚磺酰化半胱氨酸位点。然而,由于特征方案效率低下、严重的不平衡问题以及缺乏智能学习引擎,这些模型的性能并不令人满意。

目的

本研究的目的是建立一个强大而新颖的计算预测器,用于区分亚磺酰化和非亚磺酰化位点。

方法

在这项研究中,我们报告了一种创新的生物信息学特征编码工具,名为 DeepSSPred,其中通过 nSegmented 混合特征获得编码特征,然后使用合成少数过采样技术(synthetic minority oversampling)来解决 SC 位点(少数类)和非-SC 位点(多数类)之间严重的不平衡问题。最先进的 2D 卷积神经网络(2D-Convolutional Neural Network)在严格的 10 折交叉验证技术上进行了模型验证和验证。

结果

在提出的框架下,通过强大的离散特征空间表示、机器学习引擎和对基础训练数据的无偏表示,产生了一个优于所有现有研究的优秀模型。该方法在 MCC 方面比第一个最佳方法高出 6%。在独立数据集上,现有第一个最佳研究未能提供足够的细节。该模型在训练数据上的准确性提高了 7.5%,Sn 提高了 1.22%,Sp 提高了 12.91%,MCC 提高了 13.12%,在独立数据集上的准确性提高了 12.13%,Acc 提高了 27.25%,Sn 提高了 2.25%,Sp 提高了 2.25%,MCC 提高了 30.37%。与第二个最佳方法相比,这些实证分析表明,与现有文献研究相比,该模型在训练和独立数据集上的表现均优于其他模型。

结论

在这项研究中,我们开发了一种新的基于序列的 SC 位点自动预测器,称为 DeepSSPred。使用训练数据集和独立验证数据集的实证模拟结果表明了所提出理论模型的有效性。DeepSSPred 的良好性能归因于几个原因,例如新颖的鉴别特征编码方案、SMOTE 技术以及通过经过调整的 2D-CNN 分类器精心构建预测模型。我们相信,我们的研究工作将为进一步预测 S-亚磺酰化特性和功能提供一个潜在的视角。因此,我们希望我们开发的预测器将在大规模区分未知 SC 位点方面,特别是在设计新的药物方面,具有重要意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验