受试者工作特征曲线下面积对二元分类具有最一致的评估。

Area under the ROC Curve has the most consistent evaluation for binary classification.

作者信息

Li Jing

机构信息

Department of Political Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America.

出版信息

PLoS One. 2024 Dec 23;19(12):e0316019. doi: 10.1371/journal.pone.0316019. eCollection 2024.

DOI:10.1371/journal.pone.0316019

PMID:39715186

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11666033/

Abstract

The proper use of model evaluation metrics is important for model evaluation and model selection in binary classification tasks. This study investigates how consistent different metrics are at evaluating models across data of different prevalence while the relationships between different variables and the sample size are kept constant. Analyzing 156 data scenarios, 18 model evaluation metrics and five commonly used machine learning models as well as a naive random guess model, I find that evaluation metrics that are less influenced by prevalence offer more consistent evaluation of individual models and more consistent ranking of a set of models. In particular, Area Under the ROC Curve (AUC) which takes all decision thresholds into account when evaluating models has the smallest variance in evaluating individual models and smallest variance in ranking of a set of models. A close threshold analysis using all possible thresholds for all metrics further supports the hypothesis that considering all decision thresholds helps reduce the variance in model evaluation with respect to prevalence change in data. The results have significant implications for model evaluation and model selection in binary classification tasks.

摘要

在二分类任务中，正确使用模型评估指标对于模型评估和模型选择至关重要。本研究调查了在不同患病率的数据上评估模型时，不同指标在评估模型方面的一致性程度，同时保持不同变量与样本量之间的关系不变。通过分析156个数据场景、18个模型评估指标、五个常用的机器学习模型以及一个简单的随机猜测模型，我发现受患病率影响较小的评估指标对单个模型的评估更一致，对一组模型的排序也更一致。特别是，ROC曲线下面积（AUC）在评估模型时考虑了所有决策阈值，在评估单个模型时方差最小，在一组模型的排序中方差也最小。使用所有指标的所有可能阈值进行的精细阈值分析进一步支持了这样的假设，即考虑所有决策阈值有助于减少因数据患病率变化而导致的模型评估方差。这些结果对二分类任务中的模型评估和模型选择具有重要意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aae4/11666033/21f188dbadf2/pone.0316019.g001.jpg

相似文献

Area under the ROC Curve has the most consistent evaluation for binary classification.受试者工作特征曲线下面积对二元分类具有最一致的评估。

PLoS One. 2024 Dec 23;19(12):e0316019. doi: 10.1371/journal.pone.0316019. eCollection 2024.

Assessment of performance of survival prediction models for cancer prognosis.癌症预后生存预测模型性能评估。

BMC Med Res Methodol. 2012 Jul 23;12:102. doi: 10.1186/1471-2288-12-102.

Novel learning framework (knockoff technique) to evaluate metric ranking algorithms to describe human response to injury.用于评估度量排序算法以描述人类对损伤反应的新型学习框架（仿冒技术）。

Traffic Inj Prev. 2018;19(sup2):S121-S126. doi: 10.1080/15389588.2018.1519805. Epub 2018 Dec 20.

A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms.不平衡数据中机器学习算法评估的新一致性部分 AUC 和部分 c 统计量。

BMC Med Inform Decis Mak. 2020 Jan 6;20(1):4. doi: 10.1186/s12911-019-1014-6.

Regularized binormal ROC method in disease classification using microarray data.使用微阵列数据进行疾病分类的正则化双法线ROC方法。

BMC Bioinformatics. 2006 May 9;7:253. doi: 10.1186/1471-2105-7-253.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Assessing the performance of methods for central statistical monitoring of a binary or continuous outcome in multi-center trials: A simulation study.评估多中心试验中用于二分类或连续结局中心统计监测方法的性能：一项模拟研究。

Contemp Clin Trials. 2024 Aug;143:107580. doi: 10.1016/j.cct.2024.107580. Epub 2024 May 23.

ROC curves for clinical prediction models part 1. ROC plots showed no added value above the AUC when evaluating the performance of clinical prediction models.受试者工作特征曲线在临床预测模型中的应用（一）：评估临床预测模型性能时，ROC 曲线在 AUC 之上并未显示出附加价值。

J Clin Epidemiol. 2020 Oct;126:207-216. doi: 10.1016/j.jclinepi.2020.01.028. Epub 2020 Jul 23.

Comparison of statistical machine learning models for rectal protocol compliance in prostate external beam radiation therapy.统计机器学习模型在前列腺外照射放疗中直肠协议依从性比较。

Med Phys. 2020 Apr;47(4):1452-1459. doi: 10.1002/mp.14044. Epub 2020 Feb 19.

Easy and accurate variance estimation of the nonparametric estimator of the partial area under the ROC curve and its application.ROC曲线下部分面积非参数估计量的方差估计简便且准确及其应用

Stat Med. 2016 Jun 15;35(13):2251-82. doi: 10.1002/sim.6863. Epub 2016 Jan 21.

引用本文的文献

A data-driven analysis of lumbar steroid injection satisfaction in patients with chronic low back pain.慢性下腰痛患者腰椎类固醇注射满意度的数据驱动分析

Sci Rep. 2025 Jul 29;15(1):27734. doi: 10.1038/s41598-025-10907-0.

Unveiling the molecular mechanisms of stigmasterol on diabetic retinopathy: BNM framework construction and experimental validation.揭示豆甾醇对糖尿病视网膜病变的分子机制：BNM框架构建与实验验证。

Front Med (Lausanne). 2025 May 9;12:1537139. doi: 10.3389/fmed.2025.1537139. eCollection 2025.

Diagnostic accuracy of nanopore sequencing for detecting Mycobacterium tuberculosis and drug-resistant strains: a systematic review and meta-analysis.纳米孔测序检测结核分枝杆菌及耐药菌株的诊断准确性：一项系统评价与荟萃分析

Sci Rep. 2025 Apr 4;15(1):11626. doi: 10.1038/s41598-025-90089-x.

Development and validation of a machine learning model to predict the risk of lymph node metastasis in early-stage supraglottic laryngeal cancer.预测早期声门上型喉癌淋巴结转移风险的机器学习模型的开发与验证

Front Oncol. 2025 Jan 29;15:1525414. doi: 10.3389/fonc.2025.1525414. eCollection 2025.

Comparative Analysis of Recurrent Neural Networks with Conjoint Fingerprints for Skin Corrosion Prediction.用于皮肤腐蚀预测的结合指纹的循环神经网络比较分析

J Chem Inf Model. 2025 Feb 10;65(3):1305-1317. doi: 10.1021/acs.jcim.4c02062. Epub 2025 Jan 21.

本文引用的文献

The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification.马修斯相关系数（MCC）应取代受试者工作特征曲线下面积（ROC AUC），作为评估二元分类的标准指标。

BioData Min. 2023 Feb 17;16(1):4. doi: 10.1186/s13040-023-00322-4.

On evaluation metrics for medical applications of artificial intelligence.人工智能在医学应用中的评估指标。

Sci Rep. 2022 Apr 8;12(1):5979. doi: 10.1038/s41598-022-09954-8.

Considerations on the region of interest in the ROC space.关于受试者工作特征（ROC）空间中感兴趣区域的考量。

Stat Methods Med Res. 2022 Mar;31(3):419-437. doi: 10.1177/09622802211060515. Epub 2021 Dec 20.

Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores.接收器操作特性曲线在不平衡数据上的局限性：辅助设备死亡率风险评分。

J Thorac Cardiovasc Surg. 2023 Apr;165(4):1433-1442.e2. doi: 10.1016/j.jtcvs.2021.07.041. Epub 2021 Jul 30.

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.在二分类混淆矩阵评估中，马修斯相关系数（MCC）比平衡准确率、庄家知情度和标记度更可靠。

BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.

ROC and AUC with a Binary Predictor: a Potentially Misleading Metric.二元预测指标的ROC和AUC：一个可能产生误导的指标。

J Classif. 2020 Oct;37(3):696-708. doi: 10.1007/s00357-019-09345-1. Epub 2019 Dec 23.

BMC Med Inform Decis Mak. 2020 Jan 6;20(1):4. doi: 10.1186/s12911-019-1014-6.

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数（MCC）在二分类评估中优于 F1 得分和准确率的优势。

BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.

The accuracy, fairness, and limits of predicting recidivism.预测累犯的准确性、公正性和局限性。

Sci Adv. 2018 Jan 17;4(1):eaao5580. doi: 10.1126/sciadv.aao5580. eCollection 2018 Jan.

What is an ROC curve?什么是ROC曲线？

Emerg Med J. 2017 Jun;34(6):357-359. doi: 10.1136/emermed-2017-206735. Epub 2017 Mar 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

受试者工作特征曲线下面积对二元分类具有最一致的评估。

Area under the ROC Curve has the most consistent evaluation for binary classification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献