Zheng Wei, Chen Shuo, Fu Zhenyong, Zhu Fa, Yan Hui, Yang Jian
IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4562-4574. doi: 10.1109/TNNLS.2021.3058172. Epub 2022 Aug 31.
Feature selection aims to select strongly relevant features and discard the rest. Recently, embedded feature selection methods, which incorporate feature weights learning into the training process of a classifier, have attracted much attention. However, traditional embedded methods merely focus on the combinatorial optimality of all selected features. They sometimes select the weakly relevant features with satisfactory combination abilities and leave out some strongly relevant features, thereby degrading the generalization performance. To address this issue, we propose a novel embedded framework for feature selection, termed feature selection boosted by unselected features (FSBUF). Specifically, we introduce an extra classifier for unselected features into the traditional embedded model and jointly learn the feature weights to maximize the classification loss of unselected features. As a result, the extra classifier recycles the unselected strongly relevant features to replace the weakly relevant features in the selected feature subset. Our final objective can be formulated as a minimax optimization problem, and we design an effective gradient-based algorithm to solve it. Furthermore, we theoretically prove that the proposed FSBUF is able to improve the generalization ability of traditional embedded feature selection methods. Extensive experiments on synthetic and real-world data sets exhibit the comprehensibility and superior performance of FSBUF.
特征选择旨在选择高度相关的特征并舍弃其余特征。近来,将特征权重学习纳入分类器训练过程的嵌入式特征选择方法备受关注。然而,传统的嵌入式方法仅关注所有选定特征的组合最优性。它们有时会选择具有令人满意组合能力的弱相关特征,而遗漏一些强相关特征,从而降低泛化性能。为解决此问题,我们提出一种新颖的特征选择嵌入式框架,称为未选特征增强特征选择(FSBUF)。具体而言,我们在传统嵌入式模型中为未选特征引入一个额外的分类器,并联合学习特征权重,以使未选特征的分类损失最大化。结果,额外的分类器回收未选的强相关特征,以替换选定特征子集中的弱相关特征。我们的最终目标可表述为一个极小极大优化问题,并且我们设计了一种有效的基于梯度的算法来求解它。此外,我们从理论上证明,所提出的FSBUF能够提高传统嵌入式特征选择方法的泛化能力。在合成数据集和真实世界数据集上进行的大量实验展示了FSBUF的可理解性和卓越性能。