Azzam Safaa M, Emam O E, Abolaban Ahmed Sabry
Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, P.O. Box 11795, Helwan, Egypt.
Sci Rep. 2024 Jun 12;14(1):13517. doi: 10.1038/s41598-024-63328-w.
As a preprocessing for machine learning and data mining, Feature Selection plays an important role. Feature selection aims to streamline high-dimensional data by eliminating irrelevant and redundant features, which reduces the potential curse of dimensionality of a given large dataset. When working with datasets containing many features, algorithms that aim to identify the most valuable features to improve dataset accuracy may encounter difficulties because of local optima. Many studies have been conducted to solve this problem. One of the solutions is to use meta-heuristic techniques. This paper presents a combination of the Differential evolution and the sailfish optimizer algorithms (DESFO) to tackle the feature selection problem. To assess the effectiveness of the proposed algorithm, a comparison between Differential Evolution, sailfish optimizer, and nine other modern algorithms, including different optimization algorithms, is presented. The evaluation used Random forest and key nearest neighbors as quality measures. The experimental results show that the proposed algorithm is a superior algorithm compared to others. It significantly impacts high classification accuracy, achieving 85.7% with the Random Forest classifier and 100% with the Key Nearest Neighbors classifier across 14 multi-scale benchmarks. According to fitness values, it gained 71% with the Random forest and 85.7% with the Key Nearest Neighbors classifiers.
作为机器学习和数据挖掘的预处理,特征选择起着重要作用。特征选择旨在通过消除不相关和冗余特征来简化高维数据,从而减少给定大型数据集的潜在维数灾难。在处理包含许多特征的数据集时,旨在识别最有价值特征以提高数据集准确性的算法可能会因局部最优而遇到困难。已经进行了许多研究来解决这个问题。解决方案之一是使用元启发式技术。本文提出了差分进化算法和旗鱼优化器算法的组合(DESFO)来解决特征选择问题。为了评估所提出算法的有效性,本文对差分进化算法、旗鱼优化器算法以及其他九种现代算法(包括不同的优化算法)进行了比较。评估使用随机森林和关键最近邻作为质量度量。实验结果表明,与其他算法相比,所提出的算法是一种优越的算法。它对高分类准确率有显著影响,在14个多尺度基准测试中,使用随机森林分类器时达到85.7%,使用关键最近邻分类器时达到100%。根据适应度值,使用随机森林时为71%,使用关键最近邻分类器时为85.7%。