重新加载指标：图像分析验证的建议。

Metrics reloaded: recommendations for image analysis validation.

机构信息

German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany.

German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany.

出版信息

Nat Methods. 2024 Feb;21(2):195-212. doi: 10.1038/s41592-023-02151-z. Epub 2024 Feb 12.

DOI:10.1038/s41592-023-02151-z

PMID:38347141

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11182665/

Abstract

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

摘要

越来越多的证据表明，机器学习 (ML) 算法验证中的缺陷是一个被低估的全球性问题。在生物医学图像分析中，所选的性能指标往往不能反映领域兴趣，因此无法充分衡量科学进展，并阻碍 ML 技术向实践的转化。为了克服这一问题，我们创建了 Metrics Reloaded，这是一个全面的框架，指导研究人员在问题意识的基础上选择指标。它由一个大型国际联盟在多阶段 Delphi 过程中开发，基于问题指纹的新概念，即给定问题的结构化表示，它捕获了与指标选择相关的所有方面，从领域兴趣到目标结构的属性、数据集和算法输出。基于问题指纹，用户可以在选择和应用适当的验证指标时得到指导，同时意识到潜在的陷阱。Metrics Reloaded 针对可以在图像、对象或像素级别解释为分类任务的图像分析问题，即图像级分类、对象检测、语义分割和实例分割任务。为了提高用户体验，我们在 Metrics Reloaded 在线工具中实现了该框架。随着 ML 方法在应用领域的融合，Metrics Reloaded 促进了验证方法的融合。它在各种生物医学用例中得到了应用。

相似文献

Metrics reloaded: recommendations for image analysis validation.重新加载指标：图像分析验证的建议。

Nat Methods. 2024 Feb;21(2):195-212. doi: 10.1038/s41592-023-02151-z. Epub 2024 Feb 12.

Understanding metric-related pitfalls in image analysis validation.理解图像分析验证中与度量相关的陷阱。

Nat Methods. 2024 Feb;21(2):182-194. doi: 10.1038/s41592-023-02150-0. Epub 2024 Feb 12.

Understanding metric-related pitfalls in image analysis validation.了解图像分析验证中与度量相关的陷阱。

ArXiv. 2024 Feb 23:arXiv:2302.01790v4.

Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation.基于地图引导的课程领域自适应和不确定性感知的语义夜间图像分割评估。

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):3139-3153. doi: 10.1109/TPAMI.2020.3045882. Epub 2022 May 5.

Semantic-Aware Contrastive Learning for Multi-Object Medical Image Segmentation.基于语义感知对比学习的多目标医学图像分割。

IEEE J Biomed Health Inform. 2023 Sep;27(9):4444-4453. doi: 10.1109/JBHI.2023.3285230. Epub 2023 Sep 6.

CLoDSA: a tool for augmentation in classification, localization, detection, semantic segmentation and instance segmentation tasks.CLoDSA：用于分类、定位、检测、语义分割和实例分割任务增强的工具。

BMC Bioinformatics. 2019 Jun 13;20(1):323. doi: 10.1186/s12859-019-2931-1.

Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像（MRI）中进行脑肿瘤分割与检测

Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.

Unbiased image segmentation assessment toolkit for quantitative differentiation of state-of-the-art algorithms and pipelines.用于定量区分最先进算法和流程的无偏图像分割评估工具包。

BMC Bioinformatics. 2023 Oct 12;24(1):388. doi: 10.1186/s12859-023-05486-8.

Depth Density Achieves a Better Result for Semantic Segmentation with the Kinect System.深度密度可通过 Kinect 系统实现更好的语义分割效果。

Sensors (Basel). 2020 Feb 3;20(3):812. doi: 10.3390/s20030812.

Image Semantic Recognition and Segmentation Algorithm of Colorimetric Sensor Array Based on Deep Convolutional Neural Network.基于深度卷积神经网络的比色传感器阵列图像语义识别与分割算法。

Comput Intell Neurosci. 2022 Sep 30;2022:2439371. doi: 10.1155/2022/2439371. eCollection 2022.

引用本文的文献

MRI annotation using an inversion-based preprocessing for CT model adaptation.使用基于反转的预处理进行CT模型适配的MRI标注

Eur Radiol Exp. 2025 Sep 19;9(1):93. doi: 10.1186/s41747-025-00626-6.

[Multimodal data processing through AI: envisioning the operating room of the future].[通过人工智能进行多模态数据处理：展望未来手术室]

Chirurgie (Heidelb). 2025 Sep 12. doi: 10.1007/s00104-025-02377-x.

EPISeg: Automated segmentation of the spinal cord on echo planar images using open-access multi-center data.EPISeg：利用开放获取的多中心数据在回波平面图像上自动分割脊髓。

Imaging Neurosci (Camb). 2025 Sep 9;3. doi: 10.1162/IMAG.a.98. eCollection 2025.

Smartphones as Catalysts for Synergistic Nutrition: A New Era in Bioactive Detection, Personalization, and Food System Intelligence.智能手机作为协同营养的催化剂：生物活性检测、个性化定制及食品系统智能化的新时代。

Food Sci Nutr. 2025 Sep 2;13(9):e70880. doi: 10.1002/fsn3.70880. eCollection 2025 Sep.

Large-vocabulary segmentation for medical images with text prompts.基于文本提示的医学图像大词汇量分割

NPJ Digit Med. 2025 Sep 2;8(1):566. doi: 10.1038/s41746-025-01964-w.

Advancing standards in biomedical image analysis validation: A perspective on Metrics Reloaded.生物医学图像分析验证标准的进展：对“指标再审视”的一种观点

Clin Transl Med. 2025 Sep;15(9):e70237. doi: 10.1002/ctm2.70237.

[Translational challenges and clinical potential of artificial intelligence in minimally invasive surgery].人工智能在微创手术中的转化挑战与临床潜力

Chirurgie (Heidelb). 2025 Aug 26. doi: 10.1007/s00104-025-02366-0.

Individual Segmentation of Intertwined Apple Trees in a Row via Prompt Engineering.通过提示工程对成行的交织苹果树进行个体分割。

Sensors (Basel). 2025 Jul 31;25(15):4721. doi: 10.3390/s25154721.

A comprehensive and reliable protocol for manual segmentation of the human claustrum using high-resolution MRI.一种使用高分辨率磁共振成像对人类屏状核进行手动分割的全面且可靠的方案。

Brain Struct Funct. 2025 Aug 13;230(7):134. doi: 10.1007/s00429-025-02993-7.

Automatic segmentation of spinal cord lesions in MS: A robust tool for axial T2-weighted MRI scans.多发性硬化症中脊髓病变的自动分割：用于轴向T2加权MRI扫描的强大工具。

Imaging Neurosci (Camb). 2025 Jun 20;3. doi: 10.1162/IMAG.a.45. eCollection 2025.

本文引用的文献

Understanding metric-related pitfalls in image analysis validation.理解图像分析验证中与度量相关的陷阱。

Nat Methods. 2024 Feb;21(2):182-194. doi: 10.1038/s41592-023-02150-0. Epub 2024 Feb 12.

Sources of performance variability in deep learning-based polyp detection.深度学习基息肉检测中性能变异性的来源。

Int J Comput Assist Radiol Surg. 2023 Jul;18(7):1311-1322. doi: 10.1007/s11548-023-02936-9. Epub 2023 Jun 2.

A searchable image resource of GAL4 driver expression patterns with single neuron resolution.GAL4 驱动表达模式的可搜索图像资源，具有单个神经元分辨率。

Elife. 2023 Feb 23;12:e80660. doi: 10.7554/eLife.80660.

A unifying force for the realization of medical AI.实现医学人工智能的一股统一力量。

NPJ Digit Med. 2022 Nov 15;5(1):172. doi: 10.1038/s41746-022-00721-7.

Methods for Clinical Evaluation of Artificial Intelligence Algorithms for Medical Diagnosis.用于医学诊断人工智能算法临床评估的方法。

Radiology. 2023 Jan;306(1):20-31. doi: 10.1148/radiol.220182. Epub 2022 Nov 8.

Technology readiness levels for machine learning systems.机器学习系统的技术准备水平。

Nat Commun. 2022 Oct 20;13(1):6039. doi: 10.1038/s41467-022-33128-9.

The Medical Segmentation Decathlon.医学分割十项全能

Nat Commun. 2022 Jul 15;13(1):4128. doi: 10.1038/s41467-022-30695-9.

A Research Ethics Framework for the Clinical Translation of Healthcare Machine Learning.医疗机器学习临床转化的研究伦理框架

Am J Bioeth. 2022 May;22(5):8-22. doi: 10.1080/15265161.2021.2013977. Epub 2022 Jan 20.

Baseline Photos and Confident Annotation Improve Automated Detection of Cutaneous Graft-Versus-Host Disease.基线照片和可靠标注可改善皮肤移植物抗宿主病的自动检测。

Clin Hematol Int. 2021 Jul 15;3(3):108-115. doi: 10.2991/chi.k.210704.001. eCollection 2021 Sep.

Delphi methodology in healthcare research: How to decide its appropriateness.医疗保健研究中的德尔菲法：如何确定其适用性。

World J Methodol. 2021 Jul 20;11(4):116-129. doi: 10.5662/wjm.v11.i4.116.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验