Cai Ting, Ma Fan, Ye Zhiwei, Zhou Wen, Wang Mingwei, He Qiyi, Pan Hu, Shen Jun
School of Computer Science, Hubei University of Technology, Wuhan, 430068, China.
Hubei Key Laboratory of Green Intelligent Computing Power Network, Wuhan, China.
Sci Rep. 2025 Mar 7;15(1):8001. doi: 10.1038/s41598-024-78758-9.
As a primary approach to address feature selection problems, evolutionary algorithms have been widely proposed to deal with the problem. Most of these methods are designed to find a single feature subset. However, the optimal feature subset within a dataset is often not unique, indicating that feature selection exhibits multimodal characteristics. Representing data information with a single feature subset will be biased. Nevertheless, most existing evolutionary algorithms suffered from a lack of diversity, making them insufficiently effective in finding multiple optimal solutions. To address this issue, this paper investigates a new evolutionary algorithm derived from the Heterosis theory, the hybrid breeding optimization algorithm (HBO). Additionally, HBO is incorporated with dynamic niching technology and a double-stage multimodal hybrid breeding optimization (DSMHBO) is proposed. Further, to enhance the performance of the traditional HBO, neighborhood search and elite mutation strategies are introduced in the global search, and a neighborhood crossover strategy is applied to broaden the diversity of population. When the number of niches is set to 1, DSMHBO is equivalent to the double-stage hybrid breeding optimization (DSHBO). Finally, eight algorithms such as DSHBO, cuckoo search (CS), fruit fly algorithm (FA) are compared over 13 datasets. DSHBO achieves the best average classification accuracy (ACA) on 7 datasets and the best highest classification accuracy (HCA) on 10 datasets, significantly surpassing the comparison algorithms. In addition, the proposed DSMHBO is compared with newly proposed algorithms, such as whale optimization algorithm (WOA) and Harris hawk optimization algorithm(HHO) over 10 datasets. DSMHBO achieved average ACA and HCA values of 93.54% and 95.52%, much higher than the comparison models. It also can identify up to 187 feature subsets on the Lung Cancer dataset, which indicates its ability to locate multiple peaks. Moreover, even as the error level increases, the global search capability of DSMHBO remains superior to other algorithms, proving that DSMHBO is an effective method for multimodal feature selection.
作为解决特征选择问题的主要方法,进化算法已被广泛提出用于处理该问题。这些方法大多旨在找到单个特征子集。然而,数据集中的最优特征子集通常不是唯一的,这表明特征选择具有多模态特性。用单个特征子集表示数据信息会有偏差。尽管如此,大多数现有的进化算法缺乏多样性,使得它们在寻找多个最优解时效果不佳。为了解决这个问题,本文研究了一种基于杂种优势理论的新进化算法——混合育种优化算法(HBO)。此外,将HBO与动态小生境技术相结合,提出了一种双阶段多模态混合育种优化算法(DSMHBO)。进一步地,为了提高传统HBO的性能,在全局搜索中引入了邻域搜索和精英变异策略,并应用邻域交叉策略来拓宽种群的多样性。当小生境数量设置为1时,DSMHBO等同于双阶段混合育种优化算法(DSHBO)。最后,在13个数据集上对DSHBO、布谷鸟搜索算法(CS)、果蝇算法(FA)等8种算法进行了比较。DSHBO在7个数据集上获得了最佳平均分类准确率(ACA),在10个数据集上获得了最佳最高分类准确率(HCA),显著超过了比较算法。此外,将提出的DSMHBO与新提出的算法,如鲸鱼优化算法(WOA)和哈里斯鹰优化算法(HHO)在10个数据集上进行了比较。DSMHBO的平均ACA和HCA值分别达到了93.54%和95.52%,远高于比较模型。它还可以在肺癌数据集上识别多达187个特征子集,这表明其定位多个峰值的能力。此外,即使误差水平增加,DSMHBO的全局搜索能力仍然优于其他算法,证明了DSMHBO是一种有效的多模态特征选择方法。