Zhang Yijie, Cai Yuhang
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China.
Biomimetics (Basel). 2024 Oct 21;9(10):648. doi: 10.3390/biomimetics9100648.
The high dimensionality of large datasets can severely impact the data mining process. Therefore, feature selection becomes an essential preprocessing stage, aimed at reducing the dimensionality of the dataset by selecting the most informative features while improving classification accuracy. This paper proposes a novel binary Gray Wolf Optimization algorithm to address the feature selection problem in classification tasks. Firstly, the historical optimal position of the search agent helps explore more promising areas. Therefore, by linearly combining the best positions of the search agents, the algorithm's exploration capability is increased, thus enhancing its global development ability. Secondly, the novel quadratic interpolation technique, which integrates population diversity with local exploitation, helps improve both the diversity of the population and the convergence accuracy. Thirdly, chaotic perturbations (small random fluctuations) applied to the convergence factor during the exploration phase further help avoid premature convergence and promote exploration of the search space. Finally, a novel transfer function processes feature information differently at various stages, enabling the algorithm to search and optimize effectively in the binary space, thereby selecting the optimal feature subset. The proposed method employs a k-nearest neighbor classifier and evaluates performance through 10-fold cross-validation across 32 datasets. Experimental results, compared with other advanced algorithms, demonstrate the effectiveness of the proposed algorithm.
大型数据集的高维性会严重影响数据挖掘过程。因此,特征选择成为一个至关重要的预处理阶段,旨在通过选择最具信息性的特征来降低数据集的维度,同时提高分类准确率。本文提出了一种新颖的二进制灰狼优化算法来解决分类任务中的特征选择问题。首先,搜索代理的历史最优位置有助于探索更有前景的区域。因此,通过对搜索代理的最佳位置进行线性组合,提高了算法的探索能力,从而增强了其全局开发能力。其次,将种群多样性与局部开发相结合的新型二次插值技术有助于提高种群的多样性和收敛精度。第三,在探索阶段对收敛因子应用混沌扰动(小的随机波动)进一步有助于避免过早收敛,并促进对搜索空间的探索。最后,一种新颖的传递函数在不同阶段对特征信息进行不同的处理,使算法能够在二进制空间中有效地进行搜索和优化,从而选择最优特征子集。所提出的方法采用k近邻分类器,并通过对32个数据集进行10折交叉验证来评估性能。与其他先进算法相比,实验结果证明了所提算法的有效性。