Suppr超能文献

ResSUMO:一种基于残差结构的深度学习架构,用于预测赖氨酸 SUMO 化位点。

ResSUMO: A Deep Learning Architecture Based on Residual Structure for Prediction of Lysine SUMOylation Sites.

机构信息

College of Computer Science and Technology, Qingdao University, Qingdao 266071, China.

Dawning International Information Industry, Co., Ltd., Qingdao 266101, China.

出版信息

Cells. 2022 Aug 25;11(17):2646. doi: 10.3390/cells11172646.

Abstract

Lysine SUMOylation plays an essential role in various biological functions. Several approaches integrating various algorithms have been developed for predicting SUMOylation sites based on a limited dataset. Recently, the number of identified SUMOylation sites has significantly increased due to investigation at the proteomics scale. We collected modification data and found the reported approaches had poor performance using our collected data. Therefore, it is essential to explore the characteristics of this modification and construct prediction models with improved performance based on an enlarged dataset. In this study, we constructed and compared 16 classifiers by integrating four different algorithms and four encoding features selected from 11 sequence-based or physicochemical features. We found that the convolution neural network (CNN) model integrated with residue structure, dubbed ResSUMO, performed favorably when compared with the traditional machine learning and CNN models in both cross-validation and independent tests. The area under the receiver operating characteristic (ROC) curve for ResSUMO was around 0.80, superior to that of the reported predictors. We also found that increasing the depth of neural networks in the CNN models did not improve prediction performance due to the degradation problem, but the residual structure could be included to optimize the neural networks and improve performance. This indicates that residual neural networks have the potential to be broadly applied in the prediction of other types of modification sites with great effectiveness and robustness. Furthermore, the online ResSUMO service is freely accessible.

摘要

赖氨酸 SUMO 化在各种生物功能中起着至关重要的作用。已经开发了几种集成各种算法的方法,用于根据有限的数据集预测 SUMO 化位点。最近,由于在蛋白质组学规模上的研究,鉴定的 SUMO 化位点数量显著增加。我们收集了修饰数据,发现使用我们收集的数据,报道的方法性能不佳。因此,探索这种修饰的特征并基于扩大的数据集构建具有改进性能的预测模型是至关重要的。在这项研究中,我们通过整合四种不同的算法和从 11 种基于序列或物理化学特征中选择的四种编码特征,构建并比较了 16 个分类器。我们发现,卷积神经网络 (CNN) 模型与残基结构相结合,称为 ResSUMO,在交叉验证和独立测试中与传统机器学习和 CNN 模型相比表现良好。ResSUMO 的接收器操作特性 (ROC) 曲线下面积约为 0.80,优于报道的预测因子。我们还发现,由于退化问题,增加 CNN 模型中的神经网络深度不会提高预测性能,但可以包含残差结构来优化神经网络并提高性能。这表明残差神经网络有可能在其他类型的修饰位点的预测中得到广泛应用,具有很好的有效性和鲁棒性。此外,在线 ResSUMO 服务是免费提供的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f5b/9454673/2897b4c55a04/cells-11-02646-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验