Zhang Mingyuan, Agarwal Shivani
University of Pennsylvania, Philadelphia, PA 19104.
Adv Neural Inf Process Syst. 2020 Dec;33:16927-16936.
A fundamental question in multiclass classification concerns understanding the consistency properties of surrogate risk minimization algorithms, which minimize a (often convex) surrogate to the multiclass 0-1 loss. In particular, the framework of calibrated surrogates has played an important role in analyzing of such algorithms, i.e. in studying convergence to a Bayes optimal classifier (Zhang, 2004; Tewari and Bartlett, 2007). However, follow-up work has suggested this framework can be of limited value when studying ; in particular, concerns have been raised that even when the data comes from an underlying linear model, minimizing certain convex calibrated surrogates over linear scoring functions fails to recover the true model (Long and Servedio, 2013). In this paper, we investigate this apparent conundrum. We find that while some calibrated surrogates can indeed fail to provide -consistency when minimized over a naturallooking but naïvely chosen scoring function class , the situation can potentially be remedied by minimizing them over a more carefully chosen class of scoring functions . In particular, for the popular one-vs-all hinge and logistic surrogates, both of which are calibrated (and therefore provide Bayes consistency) under realizable models, but were previously shown to pose problems for realizable -consistency, we derive a form of scoring function class that enables Hconsistency. When is the class of linear models, the class consists of certain piecewise linear scoring functions that are characterized by the same number of parameters as in the linear case, and minimization over which can be performed using an adaptation of the min-pooling idea from neural network training. Our experiments confirm that the one-vs-all surrogates, when trained over this class of scoring functions , yield better multiclass classifiers than when trained over standard linear scoring functions.
多类分类中的一个基本问题是理解替代风险最小化算法的一致性属性,该算法将多类0 - 1损失的(通常是凸的)替代函数最小化。特别地,校准替代框架在分析此类算法中发挥了重要作用,即在研究收敛到贝叶斯最优分类器方面(Zhang,2004;Tewari和Bartlett,2007)。然而,后续工作表明,在研究 时,这个框架的价值可能有限;特别是,有人担心即使数据来自潜在的线性模型,在线性评分函数上最小化某些凸校准替代函数也无法恢复真实模型(Long和Servedio,2013)。在本文中,我们研究了这个明显的难题。我们发现,虽然一些校准替代函数在一个看似自然但选择天真的评分函数类 上最小化时确实可能无法提供 - 一致性,但通过在一个更精心选择的评分函数类 上最小化它们,这种情况可能会得到改善。特别地,对于流行的一对多铰链和逻辑替代函数,在可实现模型下它们都是校准的(因此提供贝叶斯一致性),但之前被证明在可实现的 - 一致性方面存在问题,我们推导了一种评分函数类 的形式,它能够实现 - 一致性。当 是线性模型类时,类 由某些分段线性评分函数组成,这些函数具有与线性情况相同数量的参数,并且可以使用神经网络训练中的最小池化思想的改编来进行最小化。我们的实验证实,一对多替代函数在这个评分函数类 上训练时,比在标准线性评分函数上训练时能产生更好的多类分类器。