Fang Lingling, Liang Xiyue
Department of Computing and Information Technology, Liaoning Normal University, Dalian, 116081 China.
J Bionic Eng. 2023;20(1):237-252. doi: 10.1007/s42235-022-00253-6. Epub 2022 Sep 7.
Feature Selection (FS) is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data. Most optimization algorithms for FS problems are not balanced in search. A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm (NL-BGWOA) is proposed to solve the problem in this paper. In the proposed method, a new position updating strategy combining the position changes of whales and grasshoppers population is expressed, which optimizes the diversity of searching in the target domain. Ten distinct high-dimensional UCI datasets, the multi-modal Parkinson's speech datasets, and the COVID-19 symptom dataset are used to validate the proposed method. It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets, which shows a high accuracy rate of up to 0.9895. Furthermore, the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem, including accuracy, size of feature subsets, and fitness with best values of 0.913, 5.7, and 0.0873, respectively. The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.
特征选择(FS)被视为数据挖掘中的一个重要预处理步骤,用于从高维数据中去除冗余或不相关的特征。大多数用于FS问题的优化算法在搜索过程中并不平衡。本文提出了一种名为非线性二进制蚱蜢鲸鱼优化算法(NL-BGWOA)的混合算法来解决该问题。在所提出的方法中,表达了一种结合鲸鱼和蚱蜢种群位置变化的新位置更新策略,该策略优化了目标域中搜索的多样性。使用十个不同的高维UCI数据集、多模态帕金森语音数据集和COVID-19症状数据集来验证所提出的方法。结果表明,所提出的NL-BGWOA在大多数高维数据集上表现良好,准确率高达0.9895。此外,在医学数据集上的实验结果也证明了所提出方法在实际FS问题中的优势,包括准确率、特征子集大小和适应度,其最佳值分别为0.913、5.7和0.0873。结果表明,所提出的NL-BGWOA在解决高维数据的FS问题方面具有综合优势。