Suppr超能文献

用于皮肤癌分类的神经网络鲁棒性基准。

A benchmark for neural network robustness in skin cancer classification.

机构信息

Digital Biomarkers for Oncology Group, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg, Germany.

Department of Dermatology and Allergy, University Hospital, LMU Munich, Munich, Germany.

出版信息

Eur J Cancer. 2021 Sep;155:191-199. doi: 10.1016/j.ejca.2021.06.047. Epub 2021 Aug 11.

Abstract

BACKGROUND

One prominent application for deep learning-based classifiers is skin cancer classification on dermoscopic images. However, classifier evaluation is often limited to holdout data which can mask common shortcomings such as susceptibility to confounding factors. To increase clinical applicability, it is necessary to thoroughly evaluate such classifiers on out-of-distribution (OOD) data.

OBJECTIVE

The objective of the study was to establish a dermoscopic skin cancer benchmark in which classifier robustness to OOD data can be measured.

METHODS

Using a proprietary dermoscopic image database and a set of image transformations, we create an OOD robustness benchmark and evaluate the robustness of four different convolutional neural network (CNN) architectures on it.

RESULTS

The benchmark contains three data sets-Skin Archive Munich (SAM), SAM-corrupted (SAM-C) and SAM-perturbed (SAM-P)-and is publicly available for download. To maintain the benchmark's OOD status, ground truth labels are not provided and test results should be sent to us for assessment. The SAM data set contains 319 unmodified and biopsy-verified dermoscopic melanoma (n = 194) and nevus (n = 125) images. SAM-C and SAM-P contain images from SAM which were artificially modified to test a classifier against low-quality inputs and to measure its prediction stability over small image changes, respectively. All four CNNs showed susceptibility to corruptions and perturbations.

CONCLUSIONS

This benchmark provides three data sets which allow for OOD testing of binary skin cancer classifiers. Our classifier performance confirms the shortcomings of CNNs and provides a frame of reference. Altogether, this benchmark should facilitate a more thorough evaluation process and thereby enable the development of more robust skin cancer classifiers.

摘要

背景

深度学习分类器的一个突出应用是基于皮肤镜图像的皮肤癌分类。然而,分类器的评估通常仅限于保留数据,这可能会掩盖常见的缺陷,如易受混杂因素的影响。为了提高临床适用性,有必要在离群(OOD)数据上对这类分类器进行彻底评估。

目的

本研究的目的是建立一个皮肤镜皮肤癌基准,以衡量分类器对 OOD 数据的稳健性。

方法

使用专有的皮肤镜图像数据库和一组图像变换,我们创建了一个 OOD 稳健性基准,并在其上评估了四种不同卷积神经网络(CNN)架构的稳健性。

结果

该基准包含三个数据集-Skin Archive Munich(SAM)、SAM-corrupted(SAM-C)和 SAM-perturbed(SAM-P)-并可公开下载。为了保持基准的 OOD 状态,不提供地面真实标签,并且应该将测试结果发送给我们进行评估。SAM 数据集包含 319 张未经修改且经过活检验证的皮肤镜黑色素瘤(n=194)和痣(n=125)图像。SAM-C 和 SAM-P 包含来自 SAM 的图像,这些图像经过人为修改,以测试分类器对低质量输入的适应能力,并衡量其在小图像变化下的预测稳定性。所有四个 CNN 都显示出对污染和扰动的敏感性。

结论

该基准提供了三个数据集,允许对二进制皮肤癌分类器进行 OOD 测试。我们的分类器性能证实了 CNN 的缺点,并提供了一个参考框架。总之,该基准应该有助于更彻底的评估过程,从而能够开发更稳健的皮肤癌分类器。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验