Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China.
Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China.
Nucleic Acids Res. 2024 Jul 5;52(W1):W238-W247. doi: 10.1093/nar/gkae346.
Small ubiquitin-like modifiers (SUMOs) are tiny but important protein regulators involved in orchestrating a broad spectrum of biological processes, either by covalently modifying protein substrates or by noncovalently interacting with other proteins. Here, we report an updated server, GPS-SUMO 2.0, for the prediction of SUMOylation sites and SUMO-interacting motifs (SIMs). For predictor training, we adopted three machine learning algorithms, penalized logistic regression (PLR), a deep neural network (DNN), and a transformer, and used 52 404 nonredundant SUMOylation sites in 8262 proteins and 163 SIMs in 102 proteins. To further increase the accuracy of predicting SUMOylation sites, a pretraining model was first constructed using 145 545 protein lysine modification sites, followed by transfer learning to fine-tune the model. GPS-SUMO 2.0 exhibited greater accuracy in predicting SUMOylation sites than did other existing tools. For users, one or multiple protein sequences or identifiers can be input, and the prediction results are shown in a tabular list. In addition to the basic statistics, we integrated knowledge from 35 public resources to annotate SUMOylation sites or SIMs. The GPS-SUMO 2.0 server is freely available at https://sumo.biocuckoo.cn/. We believe that GPS-SUMO 2.0 can serve as a useful tool for further analysis of SUMOylation and SUMO interactions.
小泛素样修饰物 (SUMO) 虽然微小,但却是一种重要的蛋白质调节剂,参与调节广泛的生物过程,其作用方式可以是通过共价修饰蛋白质底物,也可以是通过与其他蛋白质非共价相互作用。在这里,我们报告了一个更新的服务器 GPS-SUMO 2.0,用于预测 SUMO 化位点和 SUMO 相互作用基序 (SIM)。在预测器训练方面,我们采用了三种机器学习算法,包括惩罚逻辑回归 (PLR)、深度神经网络 (DNN) 和转换器,并使用了 52404 个非冗余的 SUMO 化位点,这些位点来自于 8262 种蛋白质,以及 163 个 SIMs,这些 SIMs 来自于 102 种蛋白质。为了进一步提高预测 SUMO 化位点的准确性,我们首先使用 145545 个蛋白质赖氨酸修饰位点构建了一个预训练模型,然后进行迁移学习来微调模型。GPS-SUMO 2.0 在预测 SUMO 化位点方面的准确性优于其他现有工具。对于用户来说,可以输入一个或多个蛋白质序列或标识符,预测结果将以表格形式显示。除了基本的统计数据外,我们还整合了来自 35 个公共资源的知识,用于注释 SUMO 化位点或 SIM。GPS-SUMO 2.0 服务器可在 https://sumo.biocuckoo.cn/ 免费获取。我们相信 GPS-SUMO 2.0 可以作为进一步分析 SUMO 化和 SUMO 相互作用的有用工具。