Department of Computer Engineering, Yildiz Technical University, Istanbul, Turkey.
Department of Computer Engineering, Yildiz Technical University, Istanbul, Turkey.
Genomics. 2019 Jul;111(4):669-686. doi: 10.1016/j.ygeno.2018.04.004. Epub 2018 Apr 14.
In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) pre-filtering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets.
在癌症分类中,基因选择是一种重要的数据预处理技术,但由于搜索空间大,这是一项艰巨的任务。因此,本研究的目的是开发一种混合元启发式二进制黑洞算法(BBHA)和二进制粒子群优化(BPSO)(4-2)模型,该模型强调基因选择。在该模型中,BBHA 被嵌入到 BPSO(4-2)算法中,使 BPSO(4-2)更有效,并促进 BPSO(4-2)算法的探索和开发,以进一步提高性能。该模型与随机森林递归特征消除(RF-RFE)预过滤技术相关联。在提出的框架中评估的分类器是稀疏偏最小二乘判别分析(SPLSDA);k-最近邻和朴素贝叶斯。该方法的性能在两个基准和三个临床微阵列上进行了评估。实验结果和统计分析证实,与 BBHA、BPSO(4-2)和几种最先进的方法相比,BPSO(4-2)-BBHA 在避免局部最小值、收敛速度、准确性和选择的基因数量方面具有更好的性能。结果还表明,BPSO(4-2)-BBHA 模型可以成功地从临床数据集识别出已知的生物学和统计学上显著的基因。