Ghosh Samiran, Wang Yazhen
Department of Family Medicine & Public Health Sciences, Wayne State University; Center of Molecular Medicine and Genetics, Wayne State University.
Department of Statistics, University of Wisconsin, Madison.
Stat Anal Data Min. 2015 Feb;8(1):49-63. doi: 10.1002/sam.11259. Epub 2015 Jan 26.
The support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. General theme here is to construct classifiers based on the training data in a high dimensional space by using all available dimensions. The SVM achieves huge data compression by selecting only few observations which lie close to the boundary of the classifier function. However when the number of observations are not very large (small ) but the number of dimensions/features are large (large ), then it is not necessary that all available features are of equal importance in the classification context. Possible selection of an useful fraction of the available features may result in huge data compression. In this paper we propose an algorithmic approach by means of which such an set of features could be selected. In short, we reverse the traditional sequential observation selection strategy of SVM to that of sequential feature selection. To achieve this we have modified the solution proposed by Zhu and Hastie (2005) in the context of import vector machine (IVM), to select an sub-dimensional model to build the final classifier with sufficient accuracy.
支持向量机(SVM)和其他基于再生核希尔伯特空间(RKHS)的分类系统近来备受关注,因其具有鲁棒性和泛化能力。这里的总体思路是通过利用所有可用维度,在高维空间中基于训练数据构建分类器。SVM通过仅选择少数靠近分类器函数边界的观测值来实现巨大的数据压缩。然而,当观测值数量不是非常大(少)但维度/特征数量很大(多)时,在分类背景下并非所有可用特征都具有同等重要性。选择可用特征的有用部分可能会导致巨大的数据压缩。在本文中,我们提出一种算法方法,通过该方法可以选择这样一组特征。简而言之,我们将SVM传统的顺序观测选择策略转变为顺序特征选择策略。为实现这一点,我们修改了Zhu和Hastie(2005)在导入向量机(IVM)背景下提出的解决方案,以选择一个子维度模型来构建具有足够精度的最终分类器。