IEEE Trans Cybern. 2013 Dec;43(6):1990-2004. doi: 10.1109/TSMCB.2012.2237394.
Feature selection can decrease classifier size and improve accuracy by removing noisy and/or redundant features. However, it is possible for feature selection to yield features that are only partially informative about the classes in the set. These features are beneficial for distinguishing between some classes but not others. In these cases, it is beneficial to divide the large classification problem into a set of smaller problems, where a more specific set of features can be used to classify different classes. Dividing a problem this way is also common when the base classifier is binary, and the problem needs to be reformulated as a set of two-class problems so it can be handled by the classifier. This paper presents a method for multiclass classification that simultaneously formulates a binary tree of simpler classification subproblems and performs feature selection for the individual classifiers. The feature selected hierarchical classifier (FSHC) is tested against several well-known techniques for multiclass division. Tests are run on nine different real data sets and one artificial data set using a support vector machine (SVM) classifier. The results show that the accuracy obtained by the FSHC is comparable with other common multiclass SVM methods. Furthermore, the results demonstrate that the algorithm creates solutions with fewer classifiers, fewer features, and a shorter testing time than the other SVM multiclass extensions.
特征选择可以通过去除噪声和/或冗余特征来减小分类器的大小并提高准确性。然而,特征选择也可能产生仅部分提供有关集合中类别的信息的特征。这些特征有助于区分某些类别,但对其他类别则没有帮助。在这些情况下,将大型分类问题划分为一组较小的问题是有益的,在这些问题中,可以使用更具体的特征集来对不同的类进行分类。当基础分类器为二进制且问题需要重新表述为一组两个类的问题以便由分类器处理时,也会以这种方式划分问题。本文提出了一种用于多类分类的方法,该方法同时构建了二叉树形式的更简单分类子问题,并为各个分类器执行特征选择。所提出的特征选择分层分类器(FSHC)与用于多类划分的几种著名技术进行了比较。使用支持向量机(SVM)分类器在九个不同的真实数据集和一个人工数据集上进行了测试。结果表明,FSHC 获得的准确性可与其他常见的多类 SVM 方法相媲美。此外,结果表明,与其他 SVM 多类扩展相比,该算法创建的解决方案具有更少的分类器、更少的特征和更短的测试时间。