贝叶斯一致性与ε-一致性：替代损失函数与评分函数类之间的相互作用

Bayes Consistency vs. -Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class.

作者信息

Zhang Mingyuan, Agarwal Shivani

机构信息

University of Pennsylvania, Philadelphia, PA 19104.

出版信息

Adv Neural Inf Process Syst. 2020 Dec;33:16927-16936.

PMID:34305367

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8307545/

Abstract

A fundamental question in multiclass classification concerns understanding the consistency properties of surrogate risk minimization algorithms, which minimize a (often convex) surrogate to the multiclass 0-1 loss. In particular, the framework of calibrated surrogates has played an important role in analyzing of such algorithms, i.e. in studying convergence to a Bayes optimal classifier (Zhang, 2004; Tewari and Bartlett, 2007). However, follow-up work has suggested this framework can be of limited value when studying ; in particular, concerns have been raised that even when the data comes from an underlying linear model, minimizing certain convex calibrated surrogates over linear scoring functions fails to recover the true model (Long and Servedio, 2013). In this paper, we investigate this apparent conundrum. We find that while some calibrated surrogates can indeed fail to provide -consistency when minimized over a naturallooking but naïvely chosen scoring function class , the situation can potentially be remedied by minimizing them over a more carefully chosen class of scoring functions . In particular, for the popular one-vs-all hinge and logistic surrogates, both of which are calibrated (and therefore provide Bayes consistency) under realizable models, but were previously shown to pose problems for realizable -consistency, we derive a form of scoring function class that enables Hconsistency. When is the class of linear models, the class consists of certain piecewise linear scoring functions that are characterized by the same number of parameters as in the linear case, and minimization over which can be performed using an adaptation of the min-pooling idea from neural network training. Our experiments confirm that the one-vs-all surrogates, when trained over this class of scoring functions , yield better multiclass classifiers than when trained over standard linear scoring functions.

摘要

多类分类中的一个基本问题是理解替代风险最小化算法的一致性属性，该算法将多类0 - 1损失的（通常是凸的）替代函数最小化。特别地，校准替代框架在分析此类算法中发挥了重要作用，即在研究收敛到贝叶斯最优分类器方面（Zhang，2004；Tewari和Bartlett，2007）。然而，后续工作表明，在研究时，这个框架的价值可能有限；特别是，有人担心即使数据来自潜在的线性模型，在线性评分函数上最小化某些凸校准替代函数也无法恢复真实模型（Long和Servedio，2013）。在本文中，我们研究了这个明显的难题。我们发现，虽然一些校准替代函数在一个看似自然但选择天真的评分函数类上最小化时确实可能无法提供 - 一致性，但通过在一个更精心选择的评分函数类上最小化它们，这种情况可能会得到改善。特别地，对于流行的一对多铰链和逻辑替代函数，在可实现模型下它们都是校准的（因此提供贝叶斯一致性），但之前被证明在可实现的 - 一致性方面存在问题，我们推导了一种评分函数类的形式，它能够实现 - 一致性。当是线性模型类时，类由某些分段线性评分函数组成，这些函数具有与线性情况相同数量的参数，并且可以使用神经网络训练中的最小池化思想的改编来进行最小化。我们的实验证实，一对多替代函数在这个评分函数类上训练时，比在标准线性评分函数上训练时能产生更好的多类分类器。

相似文献

Bayes Consistency vs. -Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class.贝叶斯一致性与ε-一致性：替代损失函数与评分函数类之间的相互作用

Adv Neural Inf Process Syst. 2020 Dec;33:16927-16936.

Convex Calibrated Surrogates for the Multi-Label F-Measure.用于多标签F值的凸校准代理

Proc Mach Learn Res. 2020 Jul;119:11246-11255.

On the Rates of Convergence From Surrogate Risk Minimizers to the Bayes Optimal Classifier.从替代风险最小化器到贝叶斯最优分类器的收敛速度。

IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5766-5774. doi: 10.1109/TNNLS.2021.3071370. Epub 2022 Oct 5.

Bregman divergences and surrogates for learning.用于学习的布雷格曼散度及替代方法。

IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2048-59. doi: 10.1109/TPAMI.2008.225.

Learning With Multiclass AUC: Theory and Algorithms.多类别AUC学习：理论与算法

IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7747-7763. doi: 10.1109/TPAMI.2021.3101125. Epub 2022 Oct 4.

A calibrated multiclass extension of AdaBoost.一种经过校准的AdaBoost多类扩展。

Stat Appl Genet Mol Biol. 2011 Nov 20;10(1):/j/sagmb.2011.10.issue-1/1544-6115.1731/1544-6115.1731.xml. doi: 10.2202/1544-6115.1731.

Multiclass Learning With Partially Corrupted Labels.多类学习中的部分标签损坏问题。

IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2568-2580. doi: 10.1109/TNNLS.2017.2699783. Epub 2017 May 16.

Deformation of log-likelihood loss function for multiclass boosting.多类提升的对数似然损失函数的变形。

Neural Netw. 2010 Sep;23(7):843-64. doi: 10.1016/j.neunet.2010.05.009. Epub 2010 May 26.

A Learning Framework of Nonparallel Hyperplanes Classifier.非平行超平面分类器的学习框架

ScientificWorldJournal. 2015;2015:497617. doi: 10.1155/2015/497617. Epub 2015 Jun 16.

Bayes optimality in linear discriminant analysis.线性判别分析中的贝叶斯最优性。

IEEE Trans Pattern Anal Mach Intell. 2008 Apr;30(4):647-57. doi: 10.1109/TPAMI.2007.70717.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验