Suppr超能文献

深度学习不确定性量化程序的经验频率主义覆盖率

Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures.

作者信息

Kompa Benjamin, Snoek Jasper, Beam Andrew L

机构信息

Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.

Google Research, Cambridge, MA 02142, USA.

出版信息

Entropy (Basel). 2021 Nov 30;23(12):1608. doi: 10.3390/e23121608.

Abstract

Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model's uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on held-out data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for many popular uncertainty quantification techniques for deep learning models. With marginal coverage and the complementary notion of the width of a prediction interval, downstream users of deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large-scale evaluation of the empirical frequentist coverage properties of well-known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications.

摘要

随着复杂深度学习模型在高风险现实场景中的应用日益广泛,不确定性量化变得越来越重要。目前,模型不确定性的质量是使用点预测指标来评估的,例如负对数似然(NLL)、预期校准误差(ECE)或在留出数据上的布里尔分数。预测区间或集合的边际覆盖率是统计文献中的一个著名概念,是这些指标的一种直观替代方法,但对于许多流行的深度学习模型不确定性量化技术,尚未进行系统研究。通过边际覆盖率以及预测区间宽度的互补概念,已部署机器学习模型的下游用户可以在全局数据集层面和逐个样本的基础上更好地理解不确定性量化。在本研究中,我们首次对一系列回归和分类任务中著名的不确定性量化技术的经验频率主义覆盖率属性进行了大规模评估。我们发现,一般来说,一些方法在样本上确实实现了理想的覆盖率属性,但在分布外数据上覆盖率无法保持。我们的结果表明,随着数据集偏移的增加,当前不确定性量化技术存在不足,并强化了覆盖率作为开发实际应用模型时的一个重要指标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8bd/8700765/bbb762189d5b/entropy-23-01608-g0A1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验