Fraunhofer Institute for Digital Medicine MEVIS, Max-von-Laue-Straße 2, 28359, Bremen, Germany.
Technische Universität Berlin, DAI-Labor, Ernst-Reuter-Platz 7, 10587, Berlin, Germany.
Mod Pathol. 2022 Dec;35(12):1759-1769. doi: 10.1038/s41379-022-01147-y. Epub 2022 Sep 10.
Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations on compiling test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help pathologists and regulatory agencies verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.
人工智能(AI)解决方案能够自动从数字组织学图像中提取信息,有望改善病理诊断。在常规使用之前,评估其预测性能并获得监管部门批准非常重要。这一评估需要合适的测试数据集。然而,此类数据集的编制颇具挑战,且缺少具体建议。包括商业 AI 开发者、病理学家和研究人员在内的多方利益相关者委员会讨论了关键方面,并对病理学测试数据集进行了广泛的文献回顾。在此,我们总结了结果并就测试数据集的编制得出了一般性建议。我们讨论了以下问题:需要哪些以及多少图像?如何处理低患病率子集?如何检测潜在偏差?应如何报告数据集?不同国家的监管要求是什么?这些建议旨在帮助 AI 开发者展示其产品的实用性,并帮助病理学家和监管机构验证报告的性能指标。需要进一步研究,以制定具有足够代表性的测试数据集的标准,以便 AI 解决方案能够在未来减少用户干预并更好地支持诊断工作流程。