Zhang Chong, Liu Yufeng
Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Department of Statistics and Operations Research, Carolina Center for Genome Sciences, Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
J Mach Learn Res. 2013 May 1;14:1349-1386.
Hard and soft classifiers are two important groups of techniques for classification problems. Logistic regression and Support Vector Machines are typical examples of soft and hard classifiers respectively. The essential difference between these two groups is whether one needs to estimate the class conditional probability for the classification task or not. In particular, soft classifiers predict the label based on the obtained class conditional probabilities, while hard classifiers bypass the estimation of probabilities and focus on the decision boundary. In practice, for the goal of accurate classification, it is unclear which one to use in a given situation. To tackle this problem, the Large-margin Unified Machine (LUM) was recently proposed as a unified family to embrace both groups. The LUM family enables one to study the behavior change from soft to hard binary classifiers. For multicategory cases, however, the concept of soft and hard classification becomes less clear. In that case, class probability estimation becomes more involved as it requires estimation of a probability vector. In this paper, we propose a new Multicategory LUM (MLUM) framework to investigate the behavior of soft versus hard classification under multicategory settings. Our theoretical and numerical results help to shed some light on the nature of multicategory classification and its transition behavior from soft to hard classifiers. The numerical results suggest that the proposed tuned MLUM yields very competitive performance.
硬分类器和软分类器是解决分类问题的两类重要技术。逻辑回归和支持向量机分别是软分类器和硬分类器的典型例子。这两类分类器的本质区别在于,在分类任务中是否需要估计类条件概率。具体来说,软分类器基于获得的类条件概率来预测标签,而硬分类器则绕过概率估计,专注于决策边界。在实际应用中,为了实现准确分类的目标,在给定情况下不清楚该使用哪一种分类器。为了解决这个问题,最近提出了大间隔统一机(LUM)作为一个统一的框架来涵盖这两类分类器。LUM框架使人们能够研究从软二分类器到硬二分类器的行为变化。然而,对于多类别情况,软分类和硬分类的概念变得不那么清晰。在这种情况下,类概率估计变得更加复杂,因为它需要估计一个概率向量。在本文中,我们提出了一种新的多类别LUM(MLUM)框架来研究多类别设置下软分类与硬分类的行为。我们的理论和数值结果有助于揭示多类别分类的本质及其从软分类器到硬分类器的转变行为。数值结果表明,所提出的调谐MLUM具有非常有竞争力的性能。