Suppr超能文献

标签噪声对基于深度学习的皮肤癌分类的影响。

Effects of Label Noise on Deep Learning-Based Skin Cancer Classification.

作者信息

Hekler Achim, Kather Jakob N, Krieghoff-Henning Eva, Utikal Jochen S, Meier Friedegund, Gellrich Frank F, Upmeier Zu Belzen Julius, French Lars, Schlager Justin G, Ghoreschi Kamran, Wilhelm Tabea, Kutzner Heinz, Berking Carola, Heppt Markus V, Haferkamp Sebastian, Sondermann Wiebke, Schadendorf Dirk, Schilling Bastian, Izar Benjamin, Maron Roman, Schmitt Max, Fröhling Stefan, Lipka Daniel B, Brinker Titus J

机构信息

National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany.

Department of Medicine III, RWTH University Hospital Aachen, Aachen, Germany.

出版信息

Front Med (Lausanne). 2020 May 6;7:177. doi: 10.3389/fmed.2020.00177. eCollection 2020.

Abstract

Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39-75.66%) for dermatological and 73.80% (95% CI: 73.10-74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12-65.94%, < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66-65.83%, < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem.

摘要

近期研究表明,深度学习在对皮肤镜图像进行分类方面的能力至少与皮肤科医生相当。然而,许多皮肤癌分类研究使用的是未经活检验证的训练图像。这种不完善的基本事实引入了系统误差,但目前尚不清楚其对分类器性能的影响。在此,我们通过使用804张由皮肤科医生或活检标记的黑色素瘤和痣的图像训练和评估卷积神经网络(CNN),系统地研究标签噪声的影响。通过4折交叉验证,在384张图像的测试集上对CNN进行评估,将输出结果与相应皮肤科诊断或活检验证诊断进行比较。在训练和测试标签具有相同基本事实的情况下,皮肤科诊断的准确率高达75.03%(95%置信区间:74.39 - 75.66%),活检验证标签的准确率为73.80%(95%置信区间:73.10 - 74.51%)。然而,如果CNN在不同基本事实下进行训练和测试,在未经活检验证的测试集上准确率显著下降至64.53%(95%置信区间:63.12 - 65.94%,P < 0.01),在活检验证的测试集上降至64.24%(95%置信区间:62.66 - 65.83%,P < 0.01)。总之,用于皮肤癌分类的深度学习方法对标签噪声高度敏感,未来的工作应使用活检验证的训练图像来缓解这一问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验