• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

理解图像分析验证中与度量相关的陷阱。

Understanding metric-related pitfalls in image analysis validation.

机构信息

German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany.

German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany.

出版信息

Nat Methods. 2024 Feb;21(2):182-194. doi: 10.1038/s41592-023-02150-0. Epub 2024 Feb 12.

DOI:10.1038/s41592-023-02150-0
PMID:38347140
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11181963/
Abstract

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.

摘要

验证指标对于跟踪科学进展和弥合人工智能研究与其向实际应用转化之间的当前鸿沟至关重要。然而,越来越多的证据表明,特别是在图像分析中,验证指标的选择往往不够恰当。尽管考虑到验证指标的个体优势、劣势和局限性是做出明智选择的关键前提,但相关知识目前分散且难以被个别研究人员获取。本工作基于多阶段德尔菲法(由多学科专家联盟进行)和广泛的社区反馈,为获取与图像分析中的验证指标相关的缺陷信息提供了一个可靠且全面的通用入口。虽然本工作重点关注生物医学图像分析,但所涉及的缺陷具有跨应用领域的普遍性,并根据新创建的、与领域无关的分类法进行了分类。本工作旨在增强对图像分析验证这一关键主题的全球理解。

相似文献

1
Understanding metric-related pitfalls in image analysis validation.理解图像分析验证中与度量相关的陷阱。
Nat Methods. 2024 Feb;21(2):182-194. doi: 10.1038/s41592-023-02150-0. Epub 2024 Feb 12.
2
Understanding metric-related pitfalls in image analysis validation.了解图像分析验证中与度量相关的陷阱。
ArXiv. 2024 Feb 23:arXiv:2302.01790v4.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Elbow Fractures Overview肘部骨折概述
5
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
6
MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟:一、入组、临床、液体方案。
Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.
7
Short-Term Memory Impairment短期记忆障碍
8
Establishing a Health CASCADE-Curated Open-Access Database to Consolidate Knowledge About Co-Creation: Novel Artificial Intelligence-Assisted Methodology Based on Systematic Reviews.建立一个健康 CASCADE 策划的开放获取数据库,以整合关于共同创造的知识:基于系统评价的新型人工智能辅助方法。
J Med Internet Res. 2023 Jul 18;25:e45059. doi: 10.2196/45059.
9
Are Artificial Intelligence Models Reliable for Clinical Application in Pediatric Fracture Detection on Radiographs? A Systematic Review and Meta-analysis.人工智能模型在儿科骨折X线片检测中的临床应用是否可靠?一项系统评价和荟萃分析。
Clin Orthop Relat Res. 2025 Aug 20. doi: 10.1097/CORR.0000000000003660.
10
Hail Lifestyle Medicine consensus position statement as a medical specialty: Middle Eastern perspective.欢呼将生活方式医学作为一门医学专业的共识立场声明:中东视角。
Front Public Health. 2025 Jun 20;13:1455871. doi: 10.3389/fpubh.2025.1455871. eCollection 2025.

引用本文的文献

1
Molecular Imbalances Between Striosome and Matrix Compartments Characterize the Pathogenesis and Pathophysiology of Huntington's Disease Model Mouse.纹状体小体与基质区室之间的分子失衡是亨廷顿舞蹈病模型小鼠发病机制和病理生理学的特征。
Int J Mol Sci. 2025 Sep 3;26(17):8573. doi: 10.3390/ijms26178573.
2
Advancing standards in biomedical image analysis validation: A perspective on Metrics Reloaded.生物医学图像分析验证标准的进展:对“指标再审视”的一种观点
Clin Transl Med. 2025 Sep;15(9):e70237. doi: 10.1002/ctm2.70237.
3
Impact of deep learning model uncertainty on manual corrections to MRI-based auto-segmentation in prostate cancer radiotherapy.

本文引用的文献

1
Segmentation metric misinterpretations in bioimage analysis.生物影像分析中的分割度量误读。
Nat Methods. 2024 Feb;21(2):213-216. doi: 10.1038/s41592-023-01942-8. Epub 2023 Jul 27.
2
Sources of performance variability in deep learning-based polyp detection.深度学习基息肉检测中性能变异性的来源。
Int J Comput Assist Radiol Surg. 2023 Jul;18(7):1311-1322. doi: 10.1007/s11548-023-02936-9. Epub 2023 Jun 2.
3
Integrated intracellular organization and its variations in human iPS cells.人类诱导多能干细胞中的细胞内综合组织及其变化。
深度学习模型不确定性对前列腺癌放疗中基于MRI的自动分割手动校正的影响。
J Appl Clin Med Phys. 2025 Sep;26(9):e70221. doi: 10.1002/acm2.70221.
4
Automatic segmentation of spinal cord lesions in MS: A robust tool for axial T2-weighted MRI scans.多发性硬化症中脊髓病变的自动分割:用于轴向T2加权MRI扫描的强大工具。
Imaging Neurosci (Camb). 2025 Jun 20;3. doi: 10.1162/IMAG.a.45. eCollection 2025.
5
DeepISLES: a clinically validated ischemic stroke segmentation model from the ISLES'22 challenge.DeepISLES:一个来自ISLES'22挑战赛的经过临床验证的缺血性中风分割模型。
Nat Commun. 2025 Aug 9;16(1):7357. doi: 10.1038/s41467-025-62373-x.
6
A comprehensive multifaceted technical evaluation framework for implementation of auto-segmentation models in radiotherapy.放疗中自动分割模型实施的综合多方面技术评估框架。
Commun Med (Lond). 2025 Jul 31;5(1):319. doi: 10.1038/s43856-025-01048-6.
7
[Validation of artificial intelligence algorithms for the surgical practice].[用于外科手术实践的人工智能算法的验证]
Chirurgie (Heidelb). 2025 Jul 11. doi: 10.1007/s00104-025-02348-2.
8
Predicting semantic segmentation quality in laryngeal endoscopy images.预测喉镜检查图像中的语义分割质量。
PLoS One. 2025 Jul 3;20(7):e0314573. doi: 10.1371/journal.pone.0314573. eCollection 2025.
9
Rethinking deep learning in bioimaging through a data centric lens.从以数据为中心的视角重新思考生物成像中的深度学习。
Npj Imaging. 2025 Jun 26;3(1):29. doi: 10.1038/s44303-025-00092-0.
10
Label-free live cell recognition and tracking for biological discoveries and translational applications.用于生物学发现和转化应用的无标记活细胞识别与追踪
Npj Imaging. 2024 Oct 7;2(1):41. doi: 10.1038/s44303-024-00046-y.
Nature. 2023 Jan;613(7943):345-354. doi: 10.1038/s41586-022-05563-7. Epub 2023 Jan 4.
4
The Liver Tumor Segmentation Benchmark (LiTS).肝脏肿瘤分割基准(LiTS)。
Med Image Anal. 2023 Feb;84:102680. doi: 10.1016/j.media.2022.102680. Epub 2022 Nov 17.
5
Multicenter comparison of measures for quantitative evaluation of contouring in radiotherapy.放射治疗中轮廓定量评估测量方法的多中心比较
Phys Imaging Radiat Oncol. 2022 Nov 15;24:152-158. doi: 10.1016/j.phro.2022.11.009. eCollection 2022 Oct.
6
A unifying force for the realization of medical AI.实现医学人工智能的一股统一力量。
NPJ Digit Med. 2022 Nov 15;5(1):172. doi: 10.1038/s41746-022-00721-7.
7
Delphi methodology in healthcare research: How to decide its appropriateness.医疗保健研究中的德尔菲法:如何确定其适用性。
World J Methodol. 2021 Jul 20;11(4):116-129. doi: 10.5662/wjm.v11.i4.116.
8
The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.在二分类混淆矩阵评估中,马修斯相关系数(MCC)比平衡准确率、庄家知情度和标记度更可靠。
BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.
9
Methods and open-source toolkit for analyzing and visualizing challenge results.分析和可视化挑战结果的方法和开源工具包。
Sci Rep. 2021 Jan 27;11(1):2369. doi: 10.1038/s41598-021-82017-6.
10
Evaluation of measures for assessing time-saving of automatic organ-at-risk segmentation in radiotherapy.放射治疗中自动危及器官分割时间节省评估措施的评价
Phys Imaging Radiat Oncol. 2019 Dec 17;13:1-6. doi: 10.1016/j.phro.2019.12.001. eCollection 2020 Jan.