Suppr超能文献

使用随机森林和遗传算法优化的粒子群优化技术预测O-糖基化位点

Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique.

作者信息

Hassan Hebatallah, Badr Amr, Abdelhalim M B

机构信息

Department of Computer Science, College of Computing and Information Technology, Arab Academy for Science and Technology and Maritime Transport (AASTMT), Cairo, Egypt.

Department of Computer Science, Faculty of Computers and Information, Cairo University, Cairo, Egypt.

出版信息

Bioinform Biol Insights. 2015 Jul 5;9:103-9. doi: 10.4137/BBI.S26864. eCollection 2015.

Abstract

O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.

摘要

O-糖基化是哺乳动物蛋白质糖基化的主要类型之一;它发生在丝氨酸(S)或苏氨酸(T)的特定位点上。已经开发了几种O-糖基化位点预测器。然而,仍然需要更好的预测工具。训练分类器的一个挑战是可用数据集高度不平衡,这使得少数类别的分类准确率变得不尽人意。在我们之前的工作中,我们提出了一种基于粒子群优化(PSO)和随机森林(RF)的新分类方法;该方法考虑了不平衡数据集问题。训练过程中的PSO参数设置会影响分类准确率。因此,在本文中,我们基于遗传算法对PSO算法进行参数优化,以提高分类准确率。我们提出的基于遗传算法的方法在受试者工作特征曲线下面积方面相对于现有预测器表现出更好的性能。此外,我们基于该方法实现了一个糖基化预测工具,并证明该工具能够在案例研究蛋白质中成功识别候选糖基化位点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b737/4494626/d442fd6b7c1f/bbi-9-2015-103f1.jpg

相似文献

1
Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique.
Bioinform Biol Insights. 2015 Jul 5;9:103-9. doi: 10.4137/BBI.S26864. eCollection 2015.
3
Improved PSO_AdaBoost Ensemble Algorithm for Imbalanced Data.
Sensors (Basel). 2019 Mar 26;19(6):1476. doi: 10.3390/s19061476.
5
Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.
PLoS One. 2016 Jan 14;11(1):e0146116. doi: 10.1371/journal.pone.0146116. eCollection 2016.
7
A Hybrid Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method.
Sensors (Basel). 2022 Aug 10;22(16):5986. doi: 10.3390/s22165986.
8
Genetic Learning Particle Swarm Optimization.
IEEE Trans Cybern. 2016 Oct;46(10):2277-2290. doi: 10.1109/TCYB.2015.2475174. Epub 2015 Sep 17.
9
A Swarm Optimization Genetic Algorithm Based on Quantum-Behaved Particle Swarm Optimization.
Comput Intell Neurosci. 2017;2017:2782679. doi: 10.1155/2017/2782679. Epub 2017 May 25.
10
An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.
Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.

引用本文的文献

本文引用的文献

1
Eukaryotic glycosylation: online methods for site prediction on protein sequences.
Methods Mol Biol. 2015;1273:127-37. doi: 10.1007/978-1-4939-2343-4_9.
2
In silico platform for prediction of N-, O- and C-glycosites in eukaryotic protein sequences.
PLoS One. 2013 Jun 28;8(6):e67008. doi: 10.1371/journal.pone.0067008. Print 2013.
3
Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology.
EMBO J. 2013 May 15;32(10):1478-88. doi: 10.1038/emboj.2013.79. Epub 2013 Apr 12.
4
CD-HIT: accelerated for clustering the next-generation sequencing data.
Bioinformatics. 2012 Dec 1;28(23):3150-2. doi: 10.1093/bioinformatics/bts565. Epub 2012 Oct 11.
6
Computational prediction of eukaryotic phosphorylation sites.
Bioinformatics. 2011 Nov 1;27(21):2927-35. doi: 10.1093/bioinformatics/btr525. Epub 2011 Sep 16.
7
Prediction of glycosylation sites using random forests.
BMC Bioinformatics. 2008 Nov 27;9:500. doi: 10.1186/1471-2105-9-500.
9
Glycosylation site prediction using ensembles of Support Vector Machine classifiers.
BMC Bioinformatics. 2007 Nov 9;8:438. doi: 10.1186/1471-2105-8-438.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验