Su Peng, Zhao Yuxin, Li Xiaobo, Ma Zhendi, Wang Hui
School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
Biomimetics (Basel). 2025 Aug 10;10(8):523. doi: 10.3390/biomimetics10080523.
As modern medical technology advances, the utilization of gene expression data has proliferated across diverse domains, particularly in cancer diagnosis and prognosis monitoring. However, gene expression data is often characterized by high dimensionality and a prevalence of redundant and noisy information, prompting the need for effective strategies to mitigate issues like the curse of dimensionality and overfitting. This study introduces a novel hybrid ensemble equilibrium optimizer gene selection algorithm in response. In the first stage, a hybrid approach, combining multiple filters and gene correlation-based methods, is used to select an optimal subset of genes, which is achieved by evaluating the redundancy and complementary relationships among genes to obtain a subset with maximal information content. In the second stage, an equilibrium optimizer algorithm incorporating Gaussian Barebone and a novel gene pruning strategy is employed to further search for the optimal gene subset within the candidate gene space selected in the first stage. To demonstrate the superiority of the proposed method, it was compared with nine feature selection techniques on 15 datasets. The results indicate that the ensemble filtering method in the first stage exhibits strong stability and effectively reduces the search space of the gene selection algorithms. The improved equilibrium optimizer algorithm enhances the prediction accuracy while significantly reducing the number of selected features. These findings highlight the effectiveness of the proposed method as a valuable approach for gene selection.
随着现代医学技术的进步,基因表达数据在各个领域的应用日益广泛,尤其是在癌症诊断和预后监测方面。然而,基因表达数据通常具有高维度以及大量冗余和噪声信息的特点,这就促使需要有效的策略来缓解诸如维度诅咒和过拟合等问题。本研究相应地引入了一种新颖的混合集成平衡优化器基因选择算法。在第一阶段,采用一种结合多种过滤器和基于基因相关性方法的混合方法来选择基因的最优子集,这是通过评估基因之间的冗余和互补关系来获得具有最大信息含量的子集实现的。在第二阶段,采用一种结合高斯简约法和新颖基因剪枝策略的平衡优化器算法,在第一阶段选择的候选基因空间内进一步搜索最优基因子集。为了证明所提方法的优越性,将其与15个数据集上的九种特征选择技术进行了比较。结果表明,第一阶段的集成过滤方法表现出很强的稳定性,并有效减少了基因选择算法的搜索空间。改进后的平衡优化器算法提高了预测准确性,同时显著减少了所选特征的数量。这些发现凸显了所提方法作为一种有价值的基因选择方法的有效性。