用于评估病理学人工智能解决方案的测试数据集编制建议。

Recommendations on compiling test datasets for evaluating artificial intelligence solutions in pathology.

机构信息

Fraunhofer Institute for Digital Medicine MEVIS, Max-von-Laue-Straße 2, 28359, Bremen, Germany.

Technische Universität Berlin, DAI-Labor, Ernst-Reuter-Platz 7, 10587, Berlin, Germany.

出版信息

Mod Pathol. 2022 Dec;35(12):1759-1769. doi: 10.1038/s41379-022-01147-y. Epub 2022 Sep 10.

DOI:10.1038/s41379-022-01147-y

PMID:36088478

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9708586/

Abstract

Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations on compiling test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help pathologists and regulatory agencies verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.

摘要

人工智能（AI）解决方案能够自动从数字组织学图像中提取信息，有望改善病理诊断。在常规使用之前，评估其预测性能并获得监管部门批准非常重要。这一评估需要合适的测试数据集。然而，此类数据集的编制颇具挑战，且缺少具体建议。包括商业 AI 开发者、病理学家和研究人员在内的多方利益相关者委员会讨论了关键方面，并对病理学测试数据集进行了广泛的文献回顾。在此，我们总结了结果并就测试数据集的编制得出了一般性建议。我们讨论了以下问题：需要哪些以及多少图像？如何处理低患病率子集？如何检测潜在偏差？应如何报告数据集？不同国家的监管要求是什么？这些建议旨在帮助 AI 开发者展示其产品的实用性，并帮助病理学家和监管机构验证报告的性能指标。需要进一步研究，以制定具有足够代表性的测试数据集的标准，以便 AI 解决方案能够在未来减少用户干预并更好地支持诊断工作流程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6604/9708586/14fb6d57f37f/41379_2022_1147_Fig1_HTML.jpg

相似文献

Recommendations on compiling test datasets for evaluating artificial intelligence solutions in pathology.用于评估病理学人工智能解决方案的测试数据集编制建议。

Mod Pathol. 2022 Dec;35(12):1759-1769. doi: 10.1038/s41379-022-01147-y. Epub 2022 Sep 10.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Annotating for Artificial Intelligence Applications in Digital Pathology: A Practical Guide for Pathologists and Researchers.人工智能在数字病理学中的应用标注：病理学家和研究人员实用指南。

Mod Pathol. 2023 Apr;36(4):100086. doi: 10.1016/j.modpat.2022.100086. Epub 2023 Jan 11.

[Artificial intelligence: a solution for the lack of pathologists?].[人工智能：解决病理学家短缺的办法？]

Pathologe. 2022 May;43(3):218-221. doi: 10.1007/s00292-022-01071-7. Epub 2022 Apr 11.

Revolutionizing Digital Pathology With the Power of Generative Artificial Intelligence and Foundation Models.利用生成式人工智能和基础模型推动数字病理学革命。

Lab Invest. 2023 Nov;103(11):100255. doi: 10.1016/j.labinv.2023.100255. Epub 2023 Sep 26.

Computational pathology in 2030: a Delphi study forecasting the role of AI in pathology within the next decade.2030 年的计算病理学：一项预测人工智能在未来十年内病理学中作用的德尔菲研究。

EBioMedicine. 2023 Feb;88:104427. doi: 10.1016/j.ebiom.2022.104427. Epub 2023 Jan 4.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Explainability and causability in digital pathology.数字病理学中的可解释性和可归因性。

J Pathol Clin Res. 2023 Jul;9(4):251-260. doi: 10.1002/cjp2.322. Epub 2023 Apr 12.

Artificial intelligence (AI) in medicine, current applications and future role with special emphasis on its potential and promise in pathology: present and future impact, obstacles including costs and acceptance among pathologists, practical and philosophical considerations. A comprehensive review.医学中的人工智能（AI）、当前应用及未来作用，特别强调其在病理学中的潜力和前景：当前及未来的影响、包括成本和病理学家接受度在内的障碍、实际和哲学考量。全面综述。

Diagn Pathol. 2021 Mar 17;16(1):24. doi: 10.1186/s13000-021-01085-4.

To buy or not to buy-evaluating commercial AI solutions in radiology (the ECLAIR guidelines).买还是不买——评估放射学中的商业人工智能解决方案（ECLAIR指南）。

Eur Radiol. 2021 Jun;31(6):3786-3796. doi: 10.1007/s00330-020-07684-x. Epub 2021 Mar 5.

引用本文的文献

Enhanced metastasis risk prediction in cutaneous squamous cell carcinoma using deep learning and computational histopathology.利用深度学习和计算组织病理学增强皮肤鳞状细胞癌转移风险预测

NPJ Precis Oncol. 2025 Sep 2;9(1):308. doi: 10.1038/s41698-025-01065-7.

Multiparametric cellular and spatial organization in cancer tissue lesions with a streamlined pipeline.利用简化流程对癌症组织病变进行多参数细胞和空间组织分析

Nat Biomed Eng. 2025 Aug 25. doi: 10.1038/s41551-025-01475-9.

Applications of artificial intelligence in the analysis of histopathology images of gliomas: a review.人工智能在胶质瘤组织病理学图像分析中的应用：综述

Npj Imaging. 2024 Jul 1;2(1):16. doi: 10.1038/s44303-024-00020-8.

Artificial Intelligence in Placental Pathology: New Diagnostic Imaging Tools in Evolution and in Perspective.胎盘病理学中的人工智能：不断发展与展望的新型诊断成像工具

J Imaging. 2025 Apr 3;11(4):110. doi: 10.3390/jimaging11040110.

Computational pathology for breast cancer: Where do we stand for prognostic applications?乳腺癌的计算病理学：在预后应用方面我们处于什么水平？

Breast. 2025 Jun;81:104464. doi: 10.1016/j.breast.2025.104464. Epub 2025 Mar 26.

ECP-GAN: Generating Endometrial Cancer Pathology Images and Segmentation Labels via Two-Stage Generative Adversarial Networks.ECP-GAN：通过两阶段生成对抗网络生成子宫内膜癌病理图像和分割标签

Ann Surg Oncol. 2025 Jun;32(6):4497-4507. doi: 10.1245/s10434-025-17157-4. Epub 2025 Mar 17.

AI drives the assessment of lung cancer microenvironment composition.人工智能推动肺癌微环境组成的评估。

J Pathol Inform. 2024 Sep 30;15:100400. doi: 10.1016/j.jpi.2024.100400. eCollection 2024 Dec.

Public evidence on AI products for digital pathology.关于数字病理学人工智能产品的公开证据。

NPJ Digit Med. 2024 Oct 25;7(1):300. doi: 10.1038/s41746-024-01294-3.

Recommendations for the creation of benchmark datasets for reproducible artificial intelligence in radiology.关于创建用于放射学中可重复人工智能的基准数据集的建议。

Insights Imaging. 2024 Oct 14;15(1):248. doi: 10.1186/s13244-024-01833-2.

Artificial intelligence algorithm for neoplastic cell percentage estimation and its application to copy number variation in urinary tract cancer.用于估计肿瘤细胞百分比的人工智能算法及其在尿路癌拷贝数变异中的应用。

J Pathol Transl Med. 2024 Sep;58(5):229-240. doi: 10.4132/jptm.2024.07.13. Epub 2024 Aug 9.

本文引用的文献

Sources of bias in artificial intelligence that perpetuate healthcare disparities-A global review.导致医疗保健差距长期存在的人工智能偏差来源——一项全球综述。

PLOS Digit Health. 2022 Mar 31;1(3):e0000022. doi: 10.1371/journal.pdig.0000022. eCollection 2022 Mar.

Artificial intelligence for detection of microsatellite instability in colorectal cancer-a multicentric analysis of a pre-screening tool for clinical application.人工智能检测结直肠癌微卫星不稳定性——一种用于临床应用的预筛选工具的多中心分析。

ESMO Open. 2022 Apr;7(2):100400. doi: 10.1016/j.esmoop.2022.100400. Epub 2022 Mar 2.

Stress Testing Pathology Models with Generated Artifacts.使用生成的伪迹对病理模型进行压力测试。

J Pathol Inform. 2021 Dec 24;12:54. doi: 10.4103/jpi.jpi_6_21. eCollection 2021.

Clever Hans effect found in a widely used brain tumour MRI dataset.在一个广泛使用的脑肿瘤 MRI 数据集发现了聪明汉斯效应。

Med Image Anal. 2022 Apr;77:102368. doi: 10.1016/j.media.2022.102368. Epub 2022 Jan 12.

Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge.人工智能在前列腺癌诊断和 Gleason 分级中的应用：PANDA 挑战赛。

Nat Med. 2022 Jan;28(1):154-163. doi: 10.1038/s41591-021-01620-2. Epub 2022 Jan 13.

Stain transfer using Generative Adversarial Networks and disentangled features.基于生成对抗网络和去纠缠特征的染色转移。

Comput Biol Med. 2022 Mar;142:105219. doi: 10.1016/j.compbiomed.2022.105219. Epub 2022 Jan 5.

SAFRON: Stitching Across the Frontier Network for Generating Colorectal Cancer Histology Images.SAFRON：跨越边界网络生成结直肠癌组织学图像的缝合。

Med Image Anal. 2022 Apr;77:102337. doi: 10.1016/j.media.2021.102337. Epub 2021 Dec 29.

Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review.医疗保健中基于人工智能的预测模型的指南和质量标准：一项范围综述

NPJ Digit Med. 2022 Jan 10;5(1):2. doi: 10.1038/s41746-021-00549-7.

[EMPAIA-ecosystem for pathology diagnostics with AI assistance].[具有人工智能辅助的病理学诊断EMPAIA生态系统]

Pathologe. 2021 Dec;42(Suppl 2):135-141. doi: 10.1007/s00292-021-01029-1. Epub 2021 Dec 17.

A Pathologist-Annotated Dataset for Validating Artificial Intelligence: A Project Description and Pilot Study.一个用于验证人工智能的病理学家注释数据集：项目描述与初步研究

J Pathol Inform. 2021 Nov 15;12:45. doi: 10.4103/jpi.jpi_83_20. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于评估病理学人工智能解决方案的测试数据集编制建议。

Recommendations on compiling test datasets for evaluating artificial intelligence solutions in pathology.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献