Helmholtz AI, Helmholtz Munich-German Research Center for Environmental Health, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
Institute of AI for Health, Helmholtz Munich-German Research Center for Environmental Health, Neuherberg, Germany; Institute of Pathology, University Hospital Erlangen, Erlangen, Germany.
Mod Pathol. 2024 Jan;37(1):100350. doi: 10.1016/j.modpat.2023.100350. Epub 2023 Oct 10.
Recent progress in computational pathology has been driven by deep learning. While code and data availability are essential to reproduce findings from preceding publications, ensuring a deep learning model's reusability is more challenging. For that, the codebase should be well-documented and easy to integrate into existing workflows and models should be robust toward noise and generalizable toward data from different sources. Strikingly, only a few computational pathology algorithms have been reused by other researchers so far, let alone employed in a clinical setting. To assess the current state of reproducibility and reusability of computational pathology algorithms, we evaluated peer-reviewed articles available in PubMed, published between January 2019 and March 2021, in 5 use cases: stain normalization; tissue type segmentation; evaluation of cell-level features; genetic alteration prediction; and inference of grading, staging, and prognostic information. We compiled criteria for data and code availability and statistical result analysis and assessed them in 160 publications. We found that only one-quarter (41 of 160 publications) made code publicly available. Among these 41 studies, three-quarters (30 of 41) analyzed their results statistically, half of them (20 of 41) released their trained model weights, and approximately a third (16 of 41) used an independent cohort for evaluation. Our review is intended for both pathologists interested in deep learning and researchers applying algorithms to computational pathology challenges. We provide a detailed overview of publications with published code in the field, list reusable data handling tools, and provide criteria for reproducibility and reusability.
近年来,深度学习推动了计算病理学的发展。虽然代码和数据的可获取性对于重现之前出版物的研究结果至关重要,但确保深度学习模型的可重用性更具挑战性。为此,代码库应该有详细的文档记录,并且易于集成到现有的工作流程中,模型应该对噪声具有鲁棒性,并且能够推广到来自不同来源的数据。令人惊讶的是,到目前为止,只有少数计算病理学算法被其他研究人员重复使用,更不用说在临床环境中使用了。为了评估计算病理学算法的可重现性和可重用性的现状,我们评估了在 PubMed 上可获得的 2019 年 1 月至 2021 年 3 月期间发表的 5 个使用案例的同行评审文章:染色标准化;组织类型分割;细胞级特征评估;遗传改变预测;以及分级、分期和预后信息推断。我们编制了数据和代码可用性以及统计结果分析的标准,并在 160 篇出版物中进行了评估。我们发现,只有四分之一(160 篇出版物中的 41 篇)公开提供了代码。在这 41 项研究中,四分之三(41 中的 30 项)对其结果进行了统计分析,其中一半(41 中的 20 项)发布了其训练模型的权重,大约三分之一(41 中的 16 项)使用了独立的队列进行评估。我们的综述旨在为对深度学习感兴趣的病理学家和将算法应用于计算病理学挑战的研究人员提供帮助。我们提供了该领域已发布代码的出版物的详细概述,列出了可重复使用的数据处理工具,并提供了可重现性和可重用性的标准。