Kamalov Firuz, Sulieman Hana, Moussa Sherif, Reyes Jorge Avante, Safaraliev Murodbek
Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates.
Department of Mathematics and Statistics, American University of Sharjah, Sharjah, United Arab Emirates.
Heliyon. 2023 Sep 9;9(9):e19686. doi: 10.1016/j.heliyon.2023.e19686. eCollection 2023 Sep.
It has been shown that while feature selection algorithms are able to distinguish between relevant and irrelevant features, they fail to differentiate between relevant and redundant and correlated features. To address this issue, we propose a highly effective approach, called Nested Ensemble Selection (NES), that is based on a combination of filter and wrapper methods. The proposed feature selection algorithm differs from the existing filter-wrapper hybrid methods in its simplicity and efficiency as well as precision. The new algorithm is able to separate the relevant variables from the irrelevant as well as the redundant and correlated features. Furthermore, we provide a robust heuristic for identifying the optimal number of selected features which remains one of the greatest challenges in feature selection. Numerical experiments on synthetic and real-life data demonstrate the effectiveness of the proposed method. The NES algorithm achieves perfect precision on the synthetic data and near optimal accuracy on the real-life data. The proposed method is compared against several popular algorithms including mRMR, Boruta, genetic, recursive feature elimination, Lasso, and Elastic Net. The results show that NES significantly outperforms the benchmarks algorithms especially on multi-class datasets.
研究表明,虽然特征选择算法能够区分相关特征和无关特征,但它们无法区分相关特征与冗余及相关特征。为解决这一问题,我们提出了一种高效的方法,称为嵌套集成选择(NES),该方法基于过滤法和包装法的组合。所提出的特征选择算法在简单性、效率以及精度方面与现有的过滤-包装混合方法不同。新算法能够将相关变量与无关变量以及冗余和相关特征区分开来。此外,我们提供了一种强大的启发式方法来确定所选特征的最佳数量,这仍然是特征选择中最大的挑战之一。在合成数据和实际数据上的数值实验证明了所提方法的有效性。NES算法在合成数据上实现了完美的精度,在实际数据上实现了接近最优的准确性。将所提方法与几种流行算法进行了比较,包括mRMR、Boruta、遗传算法、递归特征消除、Lasso和弹性网络。结果表明,NES明显优于基准算法,尤其是在多类数据集上。