Department of Computer Science, Faculty of Physical Sciences, Ahmadu Bello University, Zaria, Nigeria.
Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa.
PLoS One. 2023 Mar 17;18(3):e0282812. doi: 10.1371/journal.pone.0282812. eCollection 2023.
Feature selection problem represents the field of study that requires approximate algorithms to identify discriminative and optimally combined features. The evaluation and suitability of these selected features are often analyzed using classifiers. These features are locked with data increasingly being generated from different sources such as social media, surveillance systems, network applications, and medical records. The high dimensionality of these datasets often impairs the quality of the optimal combination of these features selected. The use of the binary optimization method has been proposed in the literature to address this challenge. However, the underlying deficiency of the single binary optimizer is transferred to the quality of the features selected. Though hybrid methods have been proposed, most still suffer from the inherited design limitation of the single combined methods. To address this, we proposed a novel hybrid binary optimization capable of effectively selecting features from increasingly high-dimensional datasets. The approach used in this study designed a sub-population selective mechanism that dynamically assigns individuals to a 2-level optimization process. The level-1 method first mutates items in the population and then reassigns them to a level-2 optimizer. The selective mechanism determines what sub-population is assigned for the level-2 optimizer based on the exploration and exploitation phase of the level-1 optimizer. In addition, we designed nested transfer (NT) functions and investigated the influence of the function on the level-1 optimizer. The binary Ebola optimization search algorithm (BEOSA) is applied for the level-1 mutation, while the simulated annealing (SA) and firefly (FFA) algorithms are investigated for the level-2 optimizer. The outcome of these are the HBEOSA-SA and HBEOSA-FFA, which are then investigated on the NT, and their corresponding variants HBEOSA-SA-NT and HBEOSA-FFA-NT with no NT applied. The hybrid methods were experimentally tested over high-dimensional datasets to address the challenge of feature selection. A comparative analysis was done on the methods to obtain performance variability with the low-dimensional datasets. Results obtained for classification accuracy for large, medium, and small-scale datasets are 0.995 using HBEOSA-FFA, 0.967 using HBEOSA-FFA-NT, and 0.953 using HBEOSA-FFA, respectively. Fitness and cost values relative to large, medium, and small-scale datasets are 0.066 and 0.934 using HBEOSA-FFA, 0.068 and 0.932 using HBEOSA-FFA, with 0.222 and 0.970 using HBEOSA-SA-NT, respectively. Findings from the study indicate that the HBEOSA-SA, HBEOSA-FFA, HBEOSA-SA-NT and HBEOSA-FFA-NT outperformed the BEOSA.
特征选择问题代表了需要近似算法来识别有区别和最优组合特征的研究领域。这些选定特征的评估和适用性通常使用分类器进行分析。这些特征与越来越多的来自不同来源的数据(如社交媒体、监控系统、网络应用程序和医疗记录)相关联。这些数据集的高维度通常会损害所选特征的最优组合的质量。文献中提出了使用二进制优化方法来解决这一挑战。然而,单个二进制优化器的基本缺陷被转移到了所选特征的质量上。虽然已经提出了混合方法,但大多数方法仍然受到单个组合方法的固有设计限制的影响。为了解决这个问题,我们提出了一种新的混合二进制优化方法,能够有效地从日益高维的数据集中选择特征。本研究采用的方法设计了一种子种群选择机制,该机制能够动态地将个体分配到两级优化过程中。一级方法首先对种群中的项目进行突变,然后将其重新分配到二级优化器中。选择机制根据一级优化器的探索和利用阶段来确定分配给二级优化器的子种群。此外,我们设计了嵌套转移(NT)函数,并研究了该函数对一级优化器的影响。二进制埃博拉优化搜索算法(BEOSA)应用于一级突变,而模拟退火(SA)和萤火虫(FFA)算法则用于二级优化器。结果是 HBEOSA-SA 和 HBEOSA-FFA,然后对它们进行 NT 调查,并对没有 NT 应用的 HBEOSA-SA-NT 和 HBEOSA-FFA-NT 进行相应的变体调查。混合方法在高维数据集上进行了实验测试,以解决特征选择的挑战。对这些方法进行了比较分析,以获得低维数据集的性能变化。对于大型、中型和小型数据集的分类准确性,使用 HBEOSA-FFA 获得了 0.995,使用 HBEOSA-FFA-NT 获得了 0.967,使用 HBEOSA-FFA 获得了 0.953。相对于大型、中型和小型数据集的适应度和成本值,使用 HBEOSA-FFA 获得了 0.066 和 0.934,使用 HBEOSA-FFA 获得了 0.068 和 0.932,使用 HBEOSA-SA-NT 获得了 0.222 和 0.970。研究结果表明,HBEOSA-SA、HBEOSA-FFA、HBEOSA-SA-NT 和 HBEOSA-FFA-NT 优于 BEOSA。