Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Med Image Anal. 2023 Oct;89:102914. doi: 10.1016/j.media.2023.102914. Epub 2023 Jul 28.
In the past years, deep learning has seen an increase in usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images, with a focus on the task of selective classification, where the model should reject the classification in situations in which it is uncertain. We conduct our experiments on tile-level under the aspects of domain shift and label noise, as well as on slide-level. In our experiments, we compare Deep Ensembles, Monte-Carlo Dropout, Stochastic Variational Inference, Test-Time Data Augmentation as well as ensembles of the latter approaches. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise, while contrary to results from classical computer vision benchmarks no systematic gain of the other methods can be shown. Across methods, a rejection of the most uncertain samples reliably leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
在过去几年中,深度学习在组织病理学应用领域的使用有所增加。然而,尽管这些方法显示出了巨大的潜力,但在高风险环境中,深度学习模型需要能够判断其不确定性,并在存在重大分类错误可能性时能够拒绝输入。在这项工作中,我们对用于全切片图像分类的最常用的不确定性和鲁棒性方法进行了严格的评估,重点是选择性分类任务,在这种任务中,模型应该在不确定的情况下拒绝分类。我们在领域转移和标签噪声方面进行了基于图块级别的实验,以及在幻灯片级别的实验。在我们的实验中,我们比较了 Deep Ensembles、Monte-Carlo Dropout、Stochastic Variational Inference、Test-Time Data Augmentation 以及它们的集成方法。我们观察到,方法的集成通常会导致更好的不确定性估计,并且对领域转移和标签噪声具有更高的鲁棒性,而与经典计算机视觉基准的结果相反,其他方法并没有系统性的增益。在所有方法中,可靠地拒绝最不确定的样本可以显著提高分布内和分布外数据的分类准确性。此外,我们还在不同的标签噪声条件下进行了这些方法的比较实验。最后,我们发布了我们的代码框架,以促进在组织病理学数据上进行不确定性估计的进一步研究。