Liu Yufeng, Zhang Hao Helen, Wu Yichao
Department of Statistics and Operations Research, Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC 27599.
J Am Stat Assoc. 2011 Mar 1;106(493):166-177. doi: 10.1198/jasa.2011.tm10319.
Margin-based classifiers have been popular in both machine learning and statistics for classification problems. Among numerous classifiers, some are hard classifiers while some are soft ones. Soft classifiers explicitly estimate the class conditional probabilities and then perform classification based on estimated probabilities. In contrast, hard classifiers directly target on the classification decision boundary without producing the probability estimation. These two types of classifiers are based on different philosophies and each has its own merits. In this paper, we propose a novel family of large-margin classifiers, namely large-margin unified machines (LUMs), which covers a broad range of margin-based classifiers including both hard and soft ones. By offering a natural bridge from soft to hard classification, the LUM provides a unified algorithm to fit various classifiers and hence a convenient platform to compare hard and soft classification. Both theoretical consistency and numerical performance of LUMs are explored. Our numerical study sheds some light on the choice between hard and soft classifiers in various classification problems.
基于间隔的分类器在机器学习和统计学中对于分类问题都很流行。在众多分类器中,有些是硬分类器,而有些是软分类器。软分类器明确估计类条件概率,然后基于估计的概率进行分类。相比之下,硬分类器直接针对分类决策边界,而不产生概率估计。这两种类型的分类器基于不同的理念,且各有优点。在本文中,我们提出了一种新颖的大间隔分类器家族,即大间隔统一机器(LUMs),它涵盖了广泛的基于间隔的分类器,包括硬分类器和软分类器。通过提供从软分类到硬分类的自然桥梁,LUM提供了一种统一的算法来拟合各种分类器,从而为比较硬分类和软分类提供了一个方便的平台。我们探索了LUMs的理论一致性和数值性能。我们的数值研究为在各种分类问题中硬分类器和软分类器之间的选择提供了一些启示。