基于极端梯度提升分类器的多视图特征预测蛋白质泛素化位点。

Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier.

机构信息

College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China.

出版信息

J Mol Graph Model. 2021 Sep;107:107962. doi: 10.1016/j.jmgm.2021.107962. Epub 2021 Jun 15.

DOI:10.1016/j.jmgm.2021.107962

PMID:34198216

Abstract

Ubiquitination is a common and reversible post-translational protein modification that regulates apoptosis and plays an important role in protein degradation and cell diseases. However, experimental identification of protein ubiquitination sites is usually time-consuming and labor-intensive, so it is necessary to establish effective predictors. In this study, we propose a ubiquitination sites prediction method based on multi-view features, namely UbiSite-XGBoost. Firstly, we use seven single-view features encoding methods to convert protein sequence fragments into digital information. Secondly, the least absolute shrinkage and selection operator (LASSO) is applied to remove the redundant information and get the optimal feature subsets. Finally, these features are inputted into the eXtreme gradient boosting (XGBoost) classifier to predict ubiquitination sites. Five-fold cross-validation shows that the AUC values of Set1-Set6 datasets are 0.8258, 0.7592, 0.7853, 0.8345, 0.8979 and 0.8901, respectively. The synthetic minority oversampling technique (SMOTE) is employed in Set4-Set6 unbalanced datasets, and the AUC values are 0.9777, 0.9782 and 0.9860, respectively. In addition, we have constructed three independent test datasets which the AUC values are 0.8007, 0.6897 and 0.7280, respectively. The results show that the proposed method UbiSite-XGBoost is superior to other ubiquitination prediction methods and it provides new guidance for the identification of ubiquitination sites. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/UbiSite-XGBoost/.

摘要

泛素化是一种常见且可逆转的蛋白质翻译后修饰，它调节细胞凋亡，并在蛋白质降解和细胞疾病中发挥重要作用。然而，蛋白质泛素化位点的实验鉴定通常既耗时又费力，因此有必要建立有效的预测器。在这项研究中，我们提出了一种基于多视图特征的泛素化位点预测方法，即 UbiSite-XGBoost。首先，我们使用七种单视图特征编码方法将蛋白质序列片段转换为数字信息。其次，应用最小绝对收缩和选择算子 (LASSO) 去除冗余信息并获得最优特征子集。最后，将这些特征输入到极端梯度提升 (XGBoost) 分类器中以预测泛素化位点。五重交叉验证表明，Set1-Set6 数据集的 AUC 值分别为 0.8258、0.7592、0.7853、0.8345、0.8979 和 0.8901。在 Set4-Set6 不平衡数据集上使用合成少数过采样技术 (SMOTE)，AUC 值分别为 0.9777、0.9782 和 0.9860。此外，我们构建了三个独立的测试数据集，AUC 值分别为 0.8007、0.6897 和 0.7280。结果表明，所提出的方法 UbiSite-XGBoost 优于其他泛素化预测方法，为泛素化位点的识别提供了新的指导。源代码和所有数据集可在 https://github.com/QUST-AIBBDRC/UbiSite-XGBoost/ 上获取。

相似文献

Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier.基于极端梯度提升分类器的多视图特征预测蛋白质泛素化位点。

J Mol Graph Model. 2021 Sep;107:107962. doi: 10.1016/j.jmgm.2021.107962. Epub 2021 Jun 15.

SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting.SubMito-XGBoost：通过融合多种特征信息和极端梯度提升预测蛋白质亚线粒体定位。

Bioinformatics. 2020 Feb 15;36(4):1074-1081. doi: 10.1093/bioinformatics/btz734.

Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net.基于 SMOTE 和弹性网络的 LightGBM 分类器预测蛋白质巴豆酰化位点。

Anal Biochem. 2020 Nov 15;609:113903. doi: 10.1016/j.ab.2020.113903. Epub 2020 Aug 15.

Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis.通过核主成分分析的极端梯度提升预测蛋白质-蛋白质相互作用位点。

Comput Biol Med. 2021 Jul;134:104516. doi: 10.1016/j.compbiomed.2021.104516. Epub 2021 Jun 1.

Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.使用XGBoost特征选择和堆叠集成分类器提高蛋白质-蛋白质相互作用预测准确性。

Comput Biol Med. 2020 Aug;123:103899. doi: 10.1016/j.compbiomed.2020.103899. Epub 2020 Jul 15.

Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure.基于进化信息和化学结构的 Lasso 与随机森林预测药物-靶标相互作用。

Genomics. 2019 Dec;111(6):1839-1852. doi: 10.1016/j.ygeno.2018.12.007. Epub 2018 Dec 11.

DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.DeepStack-DTIs：使用 LightGBM 特征选择和深度堆叠集成分类器预测药物-靶标相互作用。

Interdiscip Sci. 2022 Jun;14(2):311-330. doi: 10.1007/s12539-021-00488-7. Epub 2021 Nov 3.

PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy.PrUb-EL：一种基于深度学习的混合框架，使用集成学习策略识别拟南芥中的泛素化位点。

Anal Biochem. 2022 Dec 1;658:114935. doi: 10.1016/j.ab.2022.114935. Epub 2022 Oct 4.

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。

BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.

SGL-SVM: A novel method for tumor classification via support vector machine with sparse group Lasso.SGL-SVM：一种通过带稀疏组套索的支持向量机进行肿瘤分类的新方法。

J Theor Biol. 2020 Feb 7;486:110098. doi: 10.1016/j.jtbi.2019.110098. Epub 2019 Nov 28.

引用本文的文献

Post-translational modifications of Keap1: the state of the art.Keap1的翻译后修饰：最新进展

Front Cell Dev Biol. 2024 Jan 8;11:1332049. doi: 10.3389/fcell.2023.1332049. eCollection 2023.

Lysine 222 in PPAR γ1 functions as the key site of MuRF2-mediated ubiquitination modification.赖氨酸 222 在 PPARγ1 中作为 MuRF2 介导的泛素化修饰的关键位点发挥作用。

Sci Rep. 2023 Feb 3;13(1):1999. doi: 10.1038/s41598-023-28905-5.

Multi-dimensional feature recognition model based on capsule network for ubiquitination site prediction.基于胶囊网络的多维特征识别模型在泛素化位点预测中的应用。

PeerJ. 2022 Dec 6;10:e14427. doi: 10.7717/peerj.14427. eCollection 2022.

An analytical study on the identification of N-linked glycosylation sites using machine learning model.基于机器学习模型的N-糖基化位点识别分析研究

PeerJ Comput Sci. 2022 Sep 21;8:e1069. doi: 10.7717/peerj-cs.1069. eCollection 2022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于极端梯度提升分类器的多视图特征预测蛋白质泛素化位点。

Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献