National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg, Germany; Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany.
National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg, Germany.
Eur J Cancer. 2019 Sep;119:11-17. doi: 10.1016/j.ejca.2019.05.023. Epub 2019 Aug 8.
Melanoma is the most dangerous type of skin cancer but is curable if detected early. Recent publications demonstrated that artificial intelligence is capable in classifying images of benign nevi and melanoma with dermatologist-level precision. However, a statistically significant improvement compared with dermatologist classification has not been reported to date.
For this comparative study, 4204 biopsy-proven images of melanoma and nevi (1:1) were used for the training of a convolutional neural network (CNN). New techniques of deep learning were integrated. For the experiment, an additional 804 biopsy-proven dermoscopic images of melanoma and nevi (1:1) were randomly presented to dermatologists of nine German university hospitals, who evaluated the quality of each image and stated their recommended treatment (19,296 recommendations in total). Three McNemar's tests comparing the results of the CNN's test runs in terms of sensitivity, specificity and overall correctness were predefined as the main outcomes.
The respective sensitivity and specificity of lesion classification by the dermatologists were 67.2% (95% confidence interval [CI]: 62.6%-71.7%) and 62.2% (95% CI: 57.6%-66.9%). In comparison, the trained CNN achieved a higher sensitivity of 82.3% (95% CI: 78.3%-85.7%) and a higher specificity of 77.9% (95% CI: 73.8%-81.8%). The three McNemar's tests in 2 × 2 tables all reached a significance level of p < 0.001. This significance level was sustained for both subgroups.
For the first time, automated dermoscopic melanoma image classification was shown to be significantly superior to both junior and board-certified dermatologists (p < 0.001).
黑色素瘤是最危险的皮肤癌类型,但如果早期发现是可以治愈的。最近的出版物表明,人工智能能够以皮肤科医生级别的精度对良性痣和黑色素瘤的图像进行分类。然而,与皮肤科医生的分类相比,目前尚未报道有统计学意义的改善。
在这项比较研究中,使用了 4204 张经活检证实的黑色素瘤和痣(1:1)的图像来训练卷积神经网络(CNN)。整合了新的深度学习技术。在实验中,随机向 9 家德国大学医院的皮肤科医生展示了另外 804 张经活检证实的黑素瘤和痣的共聚焦激光显微镜图像(1:1),他们评估了每张图像的质量并给出了他们建议的治疗方案(总共 19296 个建议)。作为主要结果,我们预先设定了三个 McNemar 检验,比较了 CNN 测试运行的敏感性、特异性和整体准确性。
皮肤科医生对病变分类的敏感性和特异性分别为 67.2%(95%置信区间[CI]:62.6%-71.7%)和 62.2%(95% CI:57.6%-66.9%)。相比之下,经过训练的 CNN 达到了更高的敏感性 82.3%(95% CI:78.3%-85.7%)和更高的特异性 77.9%(95% CI:73.8%-81.8%)。三个 2×2 表中的 McNemar 检验均达到了 p<0.001 的显著性水平。这个显著性水平在两个亚组中都得到了维持。
首次证明自动共聚焦激光显微镜黑色素瘤图像分类明显优于初级和认证皮肤科医生(p<0.001)。