利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.

作者信息

Cai Binghuang, Jiang Xia

机构信息

Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15206-3701, USA.

出版信息

BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.

DOI:10.1186/s12859-016-0959-z

PMID:26940649

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4778322/

Abstract

BACKGROUND

Ubiquitination is a very important process in protein post-translational modification, which has been widely investigated by biology scientists and researchers. Different experimental and computational methods have been developed to identify the ubiquitination sites in protein sequences. This paper aims at exploring computational machine learning methods for the prediction of ubiquitination sites using the physicochemical properties (PCPs) of amino acids in the protein sequences.

RESULTS

We first establish six different ubiquitination data sets, whose records contain both ubiquitination sites and non-ubiquitination sites in variant numbers of protein sequence segments. In particular, to establish such data sets, protein sequence segments are extracted from the original protein sequences used in four published papers on ubiquitination, while 531 PCP features of each extracted protein sequence segment are calculated based on PCP values from AAindex (Amino Acid index database) by averaging PCP values of all amino acids on each segment. Various computational machine-learning methods, including four Bayesian network methods (i.e., Naïve Bayes (NB), Feature Selection NB (FSNB), Model Averaged NB (MANB), and Efficient Bayesian Multivariate Classifier (EBMC)) and three regression methods (i.e., Support Vector Machine (SVM), Logistic Regression (LR), and Least Absolute Shrinkage and Selection Operator (LASSO)), are then applied to the six established segment-PCP data sets. Five-fold cross-validation and the Area Under Receiver Operating Characteristic Curve (AUROC) are employed to evaluate the ubiquitination prediction performance of each method. Results demonstrate that the PCP data of protein sequences contain information that could be mined by machine learning methods for ubiquitination site prediction. The comparative results show that EBMC, SVM and LR perform better than other methods, and EBMC is the only method that can get AUCs greater than or equal to 0.6 for the six established data sets. Results also show EBMC tends to perform better for larger data.

CONCLUSIONS

Machine learning methods have been employed for the ubiquitination site prediction based on physicochemical properties of amino acids on protein sequences. Results demonstrate the effectiveness of using machine learning methodology to mine information from PCP data concerning protein sequences, as well as the superiority of EBMC, SVM and LR (especially EBMC) for the ubiquitination prediction compared to other methods.

摘要

背景

泛素化是蛋白质翻译后修饰中一个非常重要的过程，生物科学家和研究人员对此进行了广泛研究。已经开发出不同的实验和计算方法来识别蛋白质序列中的泛素化位点。本文旨在探索利用蛋白质序列中氨基酸的物理化学性质（PCP）进行泛素化位点预测的计算机学习方法。

结果

我们首先建立了六个不同的泛素化数据集，其记录包含不同数量蛋白质序列片段中的泛素化位点和非泛素化位点。具体而言，为建立这些数据集，从四篇已发表的关于泛素化的论文中使用的原始蛋白质序列中提取蛋白质序列片段，同时基于来自AAindex（氨基酸索引数据库）的PCP值，通过对每个片段上所有氨基酸的PCP值求平均值，计算每个提取的蛋白质序列片段的531个PCP特征。然后，将各种计算机学习方法，包括四种贝叶斯网络方法（即朴素贝叶斯（NB）、特征选择NB（FSNB）、模型平均NB（MANB）和高效贝叶斯多变量分类器（EBMC））以及三种回归方法（即支持向量机（SVM）、逻辑回归（LR）和最小绝对收缩和选择算子（LASSO））应用于六个已建立的片段-PCP数据集。采用五折交叉验证和受试者工作特征曲线下面积（AUROC）来评估每种方法的泛素化预测性能。结果表明，蛋白质序列的PCP数据包含可被机器学习方法挖掘用于泛素化位点预测的信息。比较结果表明，EBMC、SVM和LR的性能优于其他方法，并且EBMC是唯一一种对于六个已建立的数据集能够获得大于或等于0.6的AUC值的方法。结果还表明，对于更大的数据，EBMC往往表现得更好。

结论

基于蛋白质序列中氨基酸的物理化学性质，采用机器学习方法进行泛素化位点预测。结果证明了使用机器学习方法从关于蛋白质序列的PCP数据中挖掘信息的有效性，以及与其他方法相比，EBMC、SVM和LR（尤其是EBMC）在泛素化预测方面的优越性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e02b/4778322/a7d9d6b8dba2/12859_2016_959_Fig1_HTML.jpg

相似文献

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。

BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.

A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets.基于高维基因组数据集的临床结局预测方法的比较分析。

J Am Med Inform Assoc. 2014 Oct;21(e2):e312-9. doi: 10.1136/amiajnl-2013-002358. Epub 2014 Apr 15.

hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties.hCKSAAP_UbSite：通过利用氨基酸模式和特性改进对人泛素化位点的预测。

Biochim Biophys Acta. 2013 Aug;1834(8):1461-7. doi: 10.1016/j.bbapap.2013.04.006. Epub 2013 Apr 19.

Computational identification of ubiquitylation sites from protein sequences.从蛋白质序列中通过计算方法鉴定泛素化位点

BMC Bioinformatics. 2008 Jul 15;9:310. doi: 10.1186/1471-2105-9-310.

Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection.基于氨基酸多尺度组成和特征选择的O-糖基化位点预测

Med Biol Eng Comput. 2015 Jun;53(6):535-44. doi: 10.1007/s11517-015-1268-9. Epub 2015 Mar 10.

Remote homology detection incorporating the context of physicochemical properties.远程同源检测结合物理化学性质的上下文。

Comput Biol Med. 2014 Feb;45:43-50. doi: 10.1016/j.compbiomed.2013.11.012. Epub 2013 Nov 27.

predCar-site: Carbonylation sites prediction in proteins using support vector machine with resolving data imbalanced issue.predCar-site：使用支持向量机预测蛋白质中的羰基化位点并解决数据不平衡问题。

Anal Biochem. 2017 May 15;525:107-113. doi: 10.1016/j.ab.2017.03.008. Epub 2017 Mar 9.

Using amino acid physicochemical distance transformation for fast protein remote homology detection.利用氨基酸物化距离变换进行快速蛋白质远程同源检测。

PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.

OH-PRED: prediction of protein hydroxylation sites by incorporating adapted normal distribution bi-profile Bayes feature extraction and physicochemical properties of amino acids.OH-PRED：通过结合适应性正态分布双轮廓贝叶斯特征提取和氨基酸的物理化学性质预测蛋白质羟基化位点

J Biomol Struct Dyn. 2017 Mar;35(4):829-835. doi: 10.1080/07391102.2016.1163294. Epub 2016 May 4.

A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法，利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。

In Silico Biol. 2008;8(2):129-40.

引用本文的文献

Leveraging Deep Learning, Grid Search, and Bayesian Networks to Predict Distant Recurrence of Breast Cancer.利用深度学习、网格搜索和贝叶斯网络预测乳腺癌远处复发

Cancers (Basel). 2025 Jul 30;17(15):2515. doi: 10.3390/cancers17152515.

Deep Learning: A Heuristic Three-Stage Mechanism for Grid Searches to Optimize the Future Risk Prediction of Breast Cancer Metastasis Using EHR-Based Clinical Data.深度学习：一种用于网格搜索的启发式三阶段机制，利用基于电子健康记录的临床数据优化乳腺癌转移的未来风险预测。

Cancers (Basel). 2025 Mar 25;17(7):1092. doi: 10.3390/cancers17071092.

Predictive modeling for ubiquitin proteins through advanced machine learning technique.通过先进的机器学习技术对泛素蛋白进行预测建模。

Heliyon. 2024 Jun 6;10(12):e32517. doi: 10.1016/j.heliyon.2024.e32517. eCollection 2024 Jun 30.

Machine learning-based approaches for ubiquitination site prediction in human proteins.基于机器学习的人类蛋白质泛素化位点预测方法。

BMC Bioinformatics. 2023 Nov 28;24(1):449. doi: 10.1186/s12859-023-05581-w.

Multi-dimensional feature recognition model based on capsule network for ubiquitination site prediction.基于胶囊网络的多维特征识别模型在泛素化位点预测中的应用。

PeerJ. 2022 Dec 6;10:e14427. doi: 10.7717/peerj.14427. eCollection 2022.

Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data.利用临床数据，通过深度学习和带网格搜索的机器学习预测乳腺癌转移的后期发生情况。

J Clin Med. 2022 Sep 29;11(19):5772. doi: 10.3390/jcm11195772.

Identification of the ubiquitin-proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network.基于二维卷积神经网络的超参数优化识别泛素-蛋白酶体途径结构域

Front Genet. 2022 Jul 22;13:851688. doi: 10.3389/fgene.2022.851688. eCollection 2022.

Mini-review: Recent advances in post-translational modification site prediction based on deep learning.小型综述：基于深度学习的翻译后修饰位点预测的最新进展

Comput Struct Biotechnol J. 2022 Jun 30;20:3522-3532. doi: 10.1016/j.csbj.2022.06.045. eCollection 2022.

Computational Analysis Indicates That PARP1 Acts as a Histone Deacetylases Interactor Sharing Common Lysine Residues for Acetylation, Ubiquitination, and SUMOylation in Alzheimer's and Parkinson's Disease.计算分析表明，PARP1作为一种组蛋白去乙酰化酶相互作用蛋白，在阿尔茨海默病和帕金森病中共享用于乙酰化、泛素化和SUMO化的常见赖氨酸残基。

ACS Omega. 2021 Feb 19;6(8):5739-5753. doi: 10.1021/acsomega.0c06168. eCollection 2021 Mar 2.

Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation.利用进化谱和功能域注释对泛素化蛋白进行计算预测

Curr Genomics. 2019 Aug;20(5):389-399. doi: 10.2174/1389202919666191014091250.

本文引用的文献

Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM.基于序列的重组位点鉴定，使用伪核酸表示法和线性核支持向量机进行递归特征提取。

BMC Bioinformatics. 2014 Nov 20;15(1):340. doi: 10.1186/1471-2105-15-340.

Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features.迈向更准确的泛素化位点预测：当前方法、工具和特征的全面综述

Brief Bioinform. 2015 Jul;16(4):640-57. doi: 10.1093/bib/bbu031. Epub 2014 Sep 10.

Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble.通过多种异质子空间 SVM 集成来增强蛋白质-维生素结合残基预测。

BMC Bioinformatics. 2014 Sep 5;15(1):297. doi: 10.1186/1471-2105-15-297.

A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets.基于高维基因组数据集的临床结局预测方法的比较分析。

J Am Med Inform Assoc. 2014 Oct;21(e2):e312-9. doi: 10.1136/amiajnl-2013-002358. Epub 2014 Apr 15.

Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor.瞬态蛋白质-蛋白质相互作用预测：数据集、特征、算法和 RAD-T 预测器。

BMC Bioinformatics. 2014 Mar 24;15:82. doi: 10.1186/1471-2105-15-82.

RUBI: rapid proteomic-scale prediction of lysine ubiquitination and factors influencing predictor performance.RUBI：赖氨酸泛素化的快速蛋白质组规模预测及影响预测性能的因素

Amino Acids. 2014 Apr;46(4):853-62. doi: 10.1007/s00726-013-1645-3. Epub 2013 Dec 23.

A novel artificial neural network method for biomedical prediction based on matrix pseudo-inversion.基于矩阵伪逆的生物医学预测新型人工神经网络方法。

J Biomed Inform. 2014 Apr;48:114-21. doi: 10.1016/j.jbi.2013.12.009. Epub 2013 Dec 18.

Statistical analysis of dendritic spine distributions in rat hippocampal cultures.大鼠海马培养物树突棘分布的统计分析。

BMC Bioinformatics. 2013 Oct 2;14:287. doi: 10.1186/1471-2105-14-287.

Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites.将关键位置和氨基酸残基特征结合起来，以鉴定通用和物种特异性的泛素化连接位点。

Bioinformatics. 2013 Jul 1;29(13):1614-22. doi: 10.1093/bioinformatics/btt196. Epub 2013 Apr 26.

Biochim Biophys Acta. 2013 Aug;1834(8):1461-7. doi: 10.1016/j.bbapap.2013.04.006. Epub 2013 Apr 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献