United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics and Software Reliability, Silver Spring, Maryland; National Institutes of Health, National Cancer Institute, Division of Cancer Prevention, Cancer Prevention Fellowship Program, Bethesda, Maryland.
United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics and Software Reliability, Silver Spring, Maryland.
Mod Pathol. 2024 Apr;37(4):100439. doi: 10.1016/j.modpat.2024.100439. Epub 2024 Jan 28.
This work puts forth and demonstrates the utility of a reporting framework for collecting and evaluating annotations of medical images used for training and testing artificial intelligence (AI) models in assisting detection and diagnosis. AI has unique reporting requirements, as shown by the AI extensions to the Consolidated Standards of Reporting Trials (CONSORT) and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) checklists and the proposed AI extensions to the Standards for Reporting Diagnostic Accuracy (STARD) and Transparent Reporting of a Multivariable Prediction model for Individual Prognosis or Diagnosis (TRIPOD) checklists. AI for detection and/or diagnostic image analysis requires complete, reproducible, and transparent reporting of the annotations and metadata used in training and testing data sets. In an earlier work by other researchers, an annotation workflow and quality checklist for computational pathology annotations were proposed. In this manuscript, we operationalize this workflow into an evaluable quality checklist that applies to any reader-interpreted medical images, and we demonstrate its use for an annotation effort in digital pathology. We refer to this quality framework as the Collection and Evaluation of Annotations for Reproducible Reporting of Artificial Intelligence (CLEARR-AI).
本工作提出并展示了一种报告框架的实用性,该框架用于收集和评估用于训练和测试人工智能 (AI) 模型的医学图像注释,以辅助检测和诊断。AI 具有独特的报告要求,这体现在 CONSORT 和 SPIRIT 清单的 AI 扩展以及拟议的 AI 扩展到用于诊断准确性报告的标准 (STARD) 和用于个体预后或诊断的多变量预测模型的透明报告 (TRIPOD) 清单中。用于检测和/或诊断图像分析的 AI 需要对训练和测试数据集使用的注释和元数据进行完整、可重现和透明的报告。在其他研究人员的早期工作中,提出了用于计算病理学注释的注释工作流程和质量清单。在本文中,我们将此工作流程实现为可评估的质量清单,适用于任何读者解释的医学图像,并且我们演示了它在数字病理学注释工作中的应用。我们将此质量框架称为可重复报告人工智能的注释收集和评估 (CLEARR-AI)。