• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

G4与平衡度量族——医疗器械验证与确认研究中解决二元分类问题的新方法。

G4 & the balanced metric family - a novel approach to solving binary classification problems in medical device validation & verification studies.

作者信息

Marra Andrew

机构信息

Clinical Biostatistician at GE Healthcare, Chicago, IL, USA.

出版信息

BioData Min. 2024 Oct 23;17(1):43. doi: 10.1186/s13040-024-00402-z.

DOI:10.1186/s13040-024-00402-z
PMID:39444008
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11515465/
Abstract

BACKGROUND

In medical device validation and verification studies, the area under the receiver operating characteristic curve (AUROC) is often used as a primary endpoint despite multiple reports showing its limitations. Hence, researchers are encouraged to consider alternative metrics as primary endpoints. A new metric called G4 is presented, which is the geometric mean of sensitivity, specificity, the positive predictive value, and the negative predictive value. G4 is part of a balanced metric family which includes the Unified Performance Measure (also known as P4) and the Matthews' Correlation Coefficient (MCC). The purpose of this manuscript is to unveil the benefits of using G4 together with the balanced metric family when analyzing the overall performance of binary classifiers.

RESULTS

Simulated datasets encompassing different prevalence rates of the minority class were analyzed under a multi-reader-multi-case study design. In addition, data from an independently published study that tested the performance of a unique ultrasound artificial intelligence algorithm in the context of breast cancer detection was also considered. Within each dataset, AUROC was reported alongside the balanced metric family for comparison. When the dataset prevalence and bias of the minority class approached 50%, all three balanced metrics provided equivalent interpretations of an AI's performance. As the prevalence rate increased / decreased and the data became more imbalanced, AUROC tended to overvalue / undervalue the true classifier performance, while the balanced metric family was resistant to such imbalance. Under certain circumstances where data imbalance was strong (minority-class prevalence < 10%), MCC was preferred for standalone assessments while P4 provided a stronger effect size when evaluating between-groups analyses. G4 acted as a middle ground for maximizing both standalone assessments and between-groups analyses.

CONCLUSIONS

Use of AUROC as the primary endpoint in binary classification problems provides misleading results as the dataset becomes more imbalanced. This is explicitly noticed when incorporating AUROC in medical device validation and verification studies. G4, P4, and MCC do not share this limitation and paint a more complete picture of a medical device's performance in a clinical setting. Therefore, researchers are encouraged to explore the balanced metric family when evaluating binary classification problems.

摘要

背景

在医疗设备验证和确认研究中,尽管有多项报告指出其局限性,但受试者工作特征曲线下面积(AUROC)仍经常被用作主要终点指标。因此,鼓励研究人员考虑将替代指标作为主要终点指标。本文提出了一种名为G4的新指标,它是灵敏度、特异度、阳性预测值和阴性预测值的几何平均值。G4是平衡指标家族的一部分,该家族包括统一性能度量(也称为P4)和马修斯相关系数(MCC)。本文的目的是揭示在分析二元分类器的整体性能时,将G4与平衡指标家族一起使用的好处。

结果

在多读者多病例研究设计下,分析了包含不同少数类患病率的模拟数据集。此外,还考虑了一项独立发表的研究数据,该研究测试了一种独特的超声人工智能算法在乳腺癌检测中的性能。在每个数据集中,除了平衡指标家族外,还报告了AUROC以供比较。当少数类别的数据集患病率和偏差接近50%时,所有三个平衡指标对人工智能性能的解释是等效的。随着患病率的增加/减少以及数据变得更加不平衡,AUROC往往高估/低估了真正的分类器性能,而平衡指标家族对这种不平衡具有抗性。在某些数据不平衡严重(少数类患病率<10%)的情况下,MCC更适合单独评估,而P4在评估组间分析时提供更强的效应量。G4在最大化单独评估和组间分析方面起到了中间作用。

结论

在二元分类问题中,将AUROC用作主要终点指标会在数据集变得更加不平衡时产生误导性结果。在医疗设备验证和确认研究中纳入AUROC时,这一点尤为明显。G4、P4和MCC不存在此局限性,并且能更全面地反映医疗设备在临床环境中的性能。因此,鼓励研究人员在评估二元分类问题时探索平衡指标家族。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad4e/11515465/f10b6961cfaa/13040_2024_402_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad4e/11515465/f10b6961cfaa/13040_2024_402_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad4e/11515465/f10b6961cfaa/13040_2024_402_Fig1_HTML.jpg

相似文献

1
G4 & the balanced metric family - a novel approach to solving binary classification problems in medical device validation & verification studies.G4与平衡度量族——医疗器械验证与确认研究中解决二元分类问题的新方法。
BioData Min. 2024 Oct 23;17(1):43. doi: 10.1186/s13040-024-00402-z.
2
Mind your prevalence!留意你的患病率!
J Cheminform. 2024 Apr 15;16(1):43. doi: 10.1186/s13321-024-00837-w.
3
The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.在二分类混淆矩阵评估中,马修斯相关系数(MCC)比平衡准确率、庄家知情度和标记度更可靠。
BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.
4
Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric.使用马修斯相关系数度量的不平衡数据最优分类器。
PLoS One. 2017 Jun 2;12(6):e0177678. doi: 10.1371/journal.pone.0177678. eCollection 2017.
5
Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data.不要被类别不平衡问题困扰:选择合适的分类器和性能指标,对不平衡数据进行脑解码。
Neuroimage. 2023 Aug 15;277:120253. doi: 10.1016/j.neuroimage.2023.120253. Epub 2023 Jun 28.
6
The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification.马修斯相关系数(MCC)应取代受试者工作特征曲线下面积(ROC AUC),作为评估二元分类的标准指标。
BioData Min. 2023 Feb 17;16(1):4. doi: 10.1186/s13040-023-00322-4.
7
Novel learning framework (knockoff technique) to evaluate metric ranking algorithms to describe human response to injury.用于评估度量排序算法以描述人类对损伤反应的新型学习框架(仿冒技术)。
Traffic Inj Prev. 2018;19(sup2):S121-S126. doi: 10.1080/15389588.2018.1519805. Epub 2018 Dec 20.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
10
The receiver operating characteristic curve accurately assesses imbalanced datasets.受试者工作特征曲线能准确评估不均衡数据集。
Patterns (N Y). 2024 May 31;5(6):100994. doi: 10.1016/j.patter.2024.100994. eCollection 2024 Jun 14.

本文引用的文献

1
The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification.马修斯相关系数(MCC)应取代受试者工作特征曲线下面积(ROC AUC),作为评估二元分类的标准指标。
BioData Min. 2023 Feb 17;16(1):4. doi: 10.1186/s13040-023-00322-4.
2
Multireader Diagnostic Accuracy Imaging Studies: Fundamentals of Design and Analysis.多读者诊断准确性影像学研究:设计与分析基础。
Radiology. 2022 Apr;303(1):26-34. doi: 10.1148/radiol.211593. Epub 2022 Feb 15.
3
Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams.
人工智能系统减少了乳腺超声检查中假阳性结果的出现。
Nat Commun. 2021 Sep 24;12(1):5645. doi: 10.1038/s41467-021-26023-2.
4
Utility of a Deep-Learning Algorithm to Guide Novices to Acquire Echocardiograms for Limited Diagnostic Use.深度学习算法在指导新手获取有限诊断用途的超声心动图中的应用。
JAMA Cardiol. 2021 Jun 1;6(6):624-632. doi: 10.1001/jamacardio.2021.0185.
5
The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.在二分类混淆矩阵评估中,马修斯相关系数(MCC)比平衡准确率、庄家知情度和标记度更可靠。
BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.
6
ROC and AUC with a Binary Predictor: a Potentially Misleading Metric.二元预测指标的ROC和AUC:一个可能产生误导的指标。
J Classif. 2020 Oct;37(3):696-708. doi: 10.1007/s00357-019-09345-1. Epub 2019 Dec 23.
7
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
8
A novel method for interrogating receiver operating characteristic curves for assessing prognostic tests.一种用于评估预后试验的检验受试者工作特征曲线的新方法。
Diagn Progn Res. 2017 Nov 15;1:17. doi: 10.1186/s41512-017-0017-y. eCollection 2017.
9
ClusterBootstrap: An R package for the analysis of hierarchical data using generalized linear models with the cluster bootstrap.聚类引导:一个用于使用广义线性模型和聚类引导分析层次数据的 R 包。
Behav Res Methods. 2020 Apr;52(2):572-590. doi: 10.3758/s13428-019-01252-y.
10
Ten quick tips for machine learning in computational biology.计算生物学中机器学习的十条快速提示。
BioData Min. 2017 Dec 8;10:35. doi: 10.1186/s13040-017-0155-3. eCollection 2017.