IEEE Trans Neural Netw Learn Syst. 2017 Jul;28(7):1490-1507. doi: 10.1109/TNNLS.2016.2551724. Epub 2016 Apr 22.
Feature selection (FS) is an important component of many pattern recognition tasks. In these tasks, one is often confronted with very high-dimensional data. FS algorithms are designed to identify the relevant feature subset from the original features, which can facilitate subsequent analysis, such as clustering and classification. Structured sparsity-inducing feature selection (SSFS) methods have been widely studied in the last few years, and a number of algorithms have been proposed. However, there is no comprehensive study concerning the connections between different SSFS methods, and how they have evolved. In this paper, we attempt to provide a survey on various SSFS methods, including their motivations and mathematical representations. We then explore the relationship among different formulations and propose a taxonomy to elucidate their evolution. We group the existing SSFS methods into two categories, i.e., vector-based feature selection (feature selection based on lasso) and matrix-based feature selection (feature selection based on l-norm). Furthermore, FS has been combined with other machine learning algorithms for specific applications, such as multitask learning, multilabel learning, multiview learning, classification, and clustering. This paper not only compares the differences and commonalities of these methods based on regression and regularization strategies, but also provides useful guidelines to practitioners working in related fields to guide them how to do feature selection.
特征选择(FS)是许多模式识别任务的重要组成部分。在这些任务中,人们经常会遇到非常高维的数据。FS 算法旨在从原始特征中识别出相关的特征子集,这有助于后续的分析,如聚类和分类。近年来,结构稀疏诱导特征选择(SSFS)方法得到了广泛的研究,提出了许多算法。然而,目前还没有关于不同 SSFS 方法之间的联系以及它们是如何发展的综合研究。在本文中,我们试图对各种 SSFS 方法进行综述,包括它们的动机和数学表示。然后,我们探讨了不同公式之间的关系,并提出了一个分类法来阐明它们的发展。我们将现有的 SSFS 方法分为两类,即基于向量的特征选择(基于 lasso 的特征选择)和基于矩阵的特征选择(基于 l-norm 的特征选择)。此外,FS 已经与其他机器学习算法结合用于特定应用,如多任务学习、多标签学习、多视图学习、分类和聚类。本文不仅基于回归和正则化策略比较了这些方法的差异和共同点,还为相关领域的从业者提供了有用的指导,以指导他们如何进行特征选择。