Suppr超能文献

二元预测指标的ROC和AUC:一个可能产生误导的指标。

ROC and AUC with a Binary Predictor: a Potentially Misleading Metric.

作者信息

Muschelli John

机构信息

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205.

出版信息

J Classif. 2020 Oct;37(3):696-708. doi: 10.1007/s00357-019-09345-1. Epub 2019 Dec 23.

Abstract

In analysis of binary outcomes, the receiver operator characteristic (ROC) curve is heavily used to show the performance of a model or algorithm. The ROC curve is informative about the performance over a series of thresholds and can be summarized by the area under the curve (AUC), a single number. When a is categorical, the ROC curve has one less than number of categories as potential thresholds; when the predictor is binary there is only one threshold. As the AUC may be used in decision-making processes on determining the best model, it important to discuss how it agrees with the intuition from the ROC curve. We discuss how the interpolation of the curve between thresholds with binary predictors can largely change the AUC. Overall, we show using a linear interpolation from the ROC curve with binary predictors corresponds to the estimated AUC, which is most commonly done in software, which we believe can lead to misleading results. We compare R, Python, Stata, and SAS software implementations. We recommend using reporting the interpolation used and discuss the merit of using the step function interpolator, also referred to as the "pessimistic" approach by Fawcett (2006).

摘要

在二元结果分析中,接收者操作特征(ROC)曲线被大量用于展示模型或算法的性能。ROC曲线能反映在一系列阈值下的性能情况,并且可以用曲线下面积(AUC)这个单一数值来概括。当结果是分类变量时,ROC曲线的潜在阈值数量比类别数少一个;当预测变量是二元变量时,只有一个阈值。由于AUC可用于确定最佳模型的决策过程,因此讨论它如何与ROC曲线的直观表现相符很重要。我们讨论了使用二元预测变量时,阈值之间曲线的插值如何能极大地改变AUC。总体而言,我们表明使用二元预测变量的ROC曲线进行线性插值与估计的AUC相对应,这在软件中是最常见的做法,我们认为这可能会导致误导性结果。我们比较了R、Python、Stata和SAS软件的实现。我们建议报告所使用的插值方法,并讨论使用阶跃函数插值器的优点,Fawcett(2006)也将其称为“悲观”方法。

相似文献

8
Small-sample precision of ROC-related estimates.ROC 相关估计的小样本精度。
Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.

引用本文的文献

8
Bayesian Model Prediction for Breast Cancer Survival: A Retrospective Analysis.乳腺癌生存的贝叶斯模型预测:一项回顾性分析。
Eur J Breast Health. 2025 Jun 20;21(3):255-264. doi: 10.4274/ejbh.galenos.2025.2025-2-14. Epub 2025 May 27.

本文引用的文献

2
Deep learning improves antimicrobial peptide recognition.深度学习提高抗菌肽识别能力。
Bioinformatics. 2018 Aug 15;34(16):2740-2747. doi: 10.1093/bioinformatics/bty179.
9
Technology and the Glaucoma Suspect.技术与青光眼可疑患者
Invest Ophthalmol Vis Sci. 2016 Jul 1;57(9):OCT80-5. doi: 10.1167/iovs.15-18931.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验