Department of SASL (Mathematics), VIT Bhopal University, Sehore, India.
J Comput Biol. 2022 Jun;29(6):565-584. doi: 10.1089/cmb.2021.0410. Epub 2022 May 6.
The design of an optimal framework for the prediction of cancer from high-dimensional and imbalanced microarray data is a challenging job in the fields of bioinformatics and machine learning. There are so many techniques for dimensionality reduction, but it is unclear which of these techniques performs best with different classifiers and datasets. This article focused on the independent component analysis (ICA) features (genes) extraction method for Naïve Bayes (NB) classification of microarray data, because ICA perfectly takes out an independent component from the datasets that satisfy the classification criteria of the NB classifier. A novel hybrid method based on a nature-inspired metaheuristic algorithm is proposed in this article for resolving optimization problems of ICA extracted genes. The cuckoo search (CS) algorithm and artificial bee colony (ABC) for finding the best subset of features to increase the performance of ICA for the NB classifier is designed and executed. According to our investigation, the CS-ABC with ICA was implemented for the first time to resolve the dimensionality reduction problem in high-dimensional microarray biomedical datasets. The CS algorithm improved the local search process of the ABC algorithm, and then the hybrid algorithm CS-ABC provided better optimal gene sets that improved the classification accuracy of the NB classifier. The experimental comparison shows that the CS-ABC approach with the ICA algorithm performs a deeper search in the iterative process, which can avoid premature convergence and produce better results compared with the previously published feature selection algorithm for the NB classifier.
从高维、不平衡的微阵列数据中预测癌症的最优框架的设计是生物信息学和机器学习领域的一项具有挑战性的工作。有许多降维技术,但不清楚这些技术在不同的分类器和数据集上的性能最佳。本文主要关注独立成分分析(ICA)特征(基因)提取方法,用于微阵列数据的朴素贝叶斯(NB)分类,因为 ICA 可以从满足 NB 分类器分类标准的数据集完美地提取出独立成分。本文提出了一种基于自然启发元启发式算法的新混合方法,用于解决 ICA 提取基因的优化问题。设计并执行了基于蜂群算法(ABC)和布谷鸟搜索(CS)算法的方法,用于找到最佳特征子集,以提高 ICA 对 NB 分类器的性能。据我们调查,首次将 CS-ABC 与 ICA 结合用于解决高维微阵列生物医学数据集的降维问题。CS 算法改进了 ABC 算法的局部搜索过程,然后混合算法 CS-ABC 提供了更好的最优基因集,提高了 NB 分类器的分类准确性。实验比较表明,CS-ABC 方法与 ICA 算法在迭代过程中进行了更深层次的搜索,可以避免过早收敛,并产生比之前发布的用于 NB 分类器的特征选择算法更好的结果。