Gómez-Verdejo Vanessa, Martínez-Ramón Manel, Arenas-García Jerónimo, Lázaro-Gredilla Miguel, Molina-Bulla Harold
Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Madrid, Spain.
IEEE Trans Neural Netw. 2011 Aug;22(8):1269-83. doi: 10.1109/TNN.2011.2148727. Epub 2011 Jul 5.
This paper introduces a new support vector machine (SVM) formulation to obtain sparse solutions in the primal SVM parameters, providing a new method for feature selection based on SVMs. This new approach includes additional constraints to the classical ones that drop the weights associated to those features that are likely to be irrelevant. A ν-SVM formulation has been used, where ν indicates the fraction of features to be considered. This paper presents two versions of the proposed sparse classifier, a 2-norm SVM and a 1-norm SVM, the latter having a reduced computational burden with respect to the first one. Additionally, an explanation is provided about how the presented approach can be readily extended to multiclass classification or to problems where groups of features, rather than isolated features, need to be selected. The algorithms have been tested in a variety of synthetic and real data sets and they have been compared against other state of the art SVM-based linear feature selection methods, such as 1-norm SVM and doubly regularized SVM. The results show the good feature selection ability of the approaches.
本文介绍了一种新的支持向量机(SVM)公式,用于在原始SVM参数中获得稀疏解,为基于SVM的特征选择提供了一种新方法。这种新方法在经典方法的基础上增加了约束条件,去除了与可能无关的特征相关的权重。使用了一种ν-SVM公式,其中ν表示要考虑的特征比例。本文提出了所建议的稀疏分类器的两个版本,一个2范数SVM和一个1范数SVM,后者相对于前者计算负担更小。此外,还说明了所提出的方法如何能够很容易地扩展到多类分类或需要选择特征组而非孤立特征的问题。这些算法已在各种合成数据集和真实数据集上进行了测试,并与其他基于SVM的线性特征选择方法(如1范数SVM和双重正则化SVM)进行了比较。结果表明了这些方法具有良好的特征选择能力。