Suppr超能文献

校准:概率模型输出的交互式分析

Calibrate: Interactive Analysis of Probabilistic Model Output.

作者信息

Xenopoulos Peter, Rulff Joao, Nonato Luis Gustavo, Barr Brian, Silva Claudio

出版信息

IEEE Trans Vis Comput Graph. 2023 Jan;29(1):853-863. doi: 10.1109/TVCG.2022.3209489. Epub 2022 Dec 16.

Abstract

Analyzing classification model performance is a crucial task for machine learning practitioners. While practitioners often use count-based metrics derived from confusion matrices, like accuracy, many applications, such as weather prediction, sports betting, or patient risk prediction, rely on a classifier's predicted probabilities rather than predicted labels. In these instances, practitioners are concerned with producing a calibrated model, that is, one which outputs probabilities that reflect those of the true distribution. Model calibration is often analyzed visually, through static reliability diagrams, however, the traditional calibration visualization may suffer from a variety of drawbacks due to the strong aggregations it necessitates. Furthermore, count-based approaches are unable to sufficiently analyze model calibration. We present Calibrate, an interactive reliability diagram that addresses the aforementioned issues. Calibrate constructs a reliability diagram that is resistant to drawbacks in traditional approaches, and allows for interactive subgroup analysis and instance-level inspection. We demonstrate the utility of Calibrate through use cases on both real-world and synthetic data. We further validate Calibrate by presenting the results of a think-aloud experiment with data scientists who routinely analyze model calibration.

摘要

分析分类模型的性能是机器学习从业者的一项关键任务。虽然从业者通常使用从混淆矩阵得出的基于计数的指标,如准确率,但许多应用,如天气预报、体育博彩或患者风险预测,依赖于分类器预测的概率而非预测的标签。在这些情况下,从业者关心的是生成一个校准模型,即一个输出反映真实分布概率的模型。模型校准通常通过静态可靠性图进行可视化分析,然而,由于传统校准可视化需要进行强聚合,可能会存在各种缺点。此外,基于计数的方法无法充分分析模型校准。我们提出了Calibrate,一种交互式可靠性图,解决了上述问题。Calibrate构建了一个能抵御传统方法缺点的可靠性图,并允许进行交互式子组分析和实例级检查。我们通过在真实世界和合成数据上的用例展示了Calibrate的效用。我们还通过展示与经常分析模型校准的数据科学家进行的出声思考实验结果,进一步验证了Calibrate。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验