Springenberg Maximilian, Frommholz Annika, Wenzel Markus, Weicken Eva, Ma Jackie, Strodthoff Nils
Fraunhofer Heinrich Hertz Institute, Einsteinufer 37, 10587 Berlin, Germany.
Fraunhofer Heinrich Hertz Institute, Einsteinufer 37, 10587 Berlin, Germany.
Med Image Anal. 2023 Jul;87:102809. doi: 10.1016/j.media.2023.102809. Epub 2023 Apr 28.
While machine learning is currently transforming the field of histopathology, the domain lacks a comprehensive evaluation of state-of-the-art models based on essential but complementary quality requirements beyond a mere classification accuracy. In order to fill this gap, we developed a new methodology to extensively evaluate a wide range of classification models, including recent vision transformers, and convolutional neural networks such as: ConvNeXt, ResNet (BiT), Inception, ViT and Swin transformer, with and without supervised or self-supervised pretraining. We thoroughly tested the models on five widely used histopathology datasets containing whole slide images of breast, gastric, and colorectal cancer and developed a novel approach using an image-to-image translation model to assess the robustness of a cancer classification model against stain variations. Further, we extended existing interpretability methods to previously unstudied models and systematically reveal insights of the models' classification strategies that allow for plausibility checks and systematic comparisons. The study resulted in specific model recommendations for practitioners as well as putting forward a general methodology to quantify a model's quality according to complementary requirements that can be transferred to future model architectures.
虽然机器学习目前正在改变组织病理学领域,但该领域缺乏对基于基本但互补质量要求的最先进模型的全面评估,而不仅仅是分类准确率。为了填补这一空白,我们开发了一种新方法,以广泛评估各种分类模型,包括最近的视觉Transformer以及卷积神经网络,如:ConvNeXt、ResNet(BiT)、Inception、ViT和Swin Transformer,有无监督或自监督预训练均可。我们在五个广泛使用的组织病理学数据集上对模型进行了全面测试,这些数据集包含乳腺癌、胃癌和结直肠癌的全切片图像,并开发了一种使用图像到图像翻译模型的新方法,以评估癌症分类模型对染色变化的鲁棒性。此外,我们将现有的可解释性方法扩展到以前未研究过的模型,并系统地揭示模型分类策略的见解,以便进行合理性检查和系统比较。该研究为从业者提供了具体的模型建议,并提出了一种通用方法,根据可转移到未来模型架构的互补要求来量化模型质量。