Department of Dermatology, University of Heidelberg, Heidelberg, Germany.
Department of Dermatology, University of Göttingen, Göttingen, Germany.
Eur J Cancer. 2020 Aug;135:39-46. doi: 10.1016/j.ejca.2020.04.043. Epub 2020 Jun 10.
Convolutional neural networks (CNNs) have shown a dermatologist-level performance in the classification of skin lesions. We aimed to deliver a head-to-head comparison of a conventional image analyser (CIA), which depends on segmentation and weighting of handcrafted features, to a CNN trained by deep learning.
Cross-sectional study using a real-world, prospectively acquired, dermoscopic dataset of 1981 skin lesions to compare the diagnostic performance of a market-approved CNN (Moleanalyzer-Pro™, developed in 2018) to a CIA (Moleanalyzer-3™/Dynamole™; developed in 2004, all FotoFinder Systems Inc, Germany). As a reference standard, we used histopathological diagnoses (n = 785) or, in non-excised benign lesions (n = 1196), expert consensus plus an uneventful follow-up by sequential digital dermoscopy for at least 2 years.
A total of 281 malignant lesions and 1700 benign lesions from 435 patients (62.2% male, mean age: 52 years) were prospectively imaged. The CNN showed a sensitivity of 77.6% (95% confidence interval [CI]: [72.4%-82.1%]), specificity of 95.3% (95% CI: [94.2%-96.2%]), and receiver operating characteristic (ROC)-area under the curve (AUC) of 0.945 (95% CI: [0.930-0.961]). In contrast, the CIA achieved a sensitivity of 53.4% (95% CI: [47.5%-59.1%]), specificity of 86.6% (95% CI: [84.9%-88.1%]) and ROC-AUC of 0.738 (95% CI: [0.701-0.774]). The data set included melanomas originally diagnosed by dynamic changes during sequential digital dermoscopy (52 of 201, 20.6%), which reduced the sensitivities of both classifiers. Pairwise comparisons of sensitivities, specificities, and ROC-AUCs indicated a clear outperformance by the CNN (all p < 0.001).
The superior diagnostic performance of the CNN argues against a continued application of former CIAs as an aide to physicians' clinical management decisions.
卷积神经网络(CNN)在皮肤病变分类方面已经达到了皮肤科医生的水平。我们旨在对传统图像分析器(CIA)和经过深度学习训练的 CNN 进行直接比较,CIA 依赖于手工特征的分割和加权。
使用前瞻性采集的真实世界、共纳入 1981 个皮肤病变的临床共焦激光显微镜数据集进行横断面研究,以比较市场上已批准的 CNN(Moleanalyzer-Pro™,于 2018 年开发)和 CIA(Moleanalyzer-3™/Dynamole™;于 2004 年开发,均由 FotoFinder Systems Inc 公司生产,德国)的诊断性能。作为参考标准,我们使用组织病理学诊断(n=785)或非切除良性病变(n=1196),采用专家共识,对连续数字共焦激光显微镜进行至少 2 年的随访,未出现不良事件。
前瞻性地对 435 例患者(62.2%为男性,平均年龄 52 岁)的 281 个恶性病变和 1700 个良性病变进行成像。CNN 的敏感性为 77.6%(95%置信区间[CI]:[72.4%-82.1%]),特异性为 95.3%(95% CI:[94.2%-96.2%]),接收器操作特征(ROC)曲线下面积(AUC)为 0.945(95% CI:[0.930-0.961])。相比之下,CIA 的敏感性为 53.4%(95% CI:[47.5%-59.1%]),特异性为 86.6%(95% CI:[84.9%-88.1%]),ROC-AUC 为 0.738(95% CI:[0.701-0.774])。该数据集包括最初通过连续数字共焦激光显微镜动态变化诊断的黑色素瘤(201 例中的 52 例,20.6%),这降低了两种分类器的敏感性。敏感性、特异性和 ROC-AUC 的两两比较表明 CNN 具有明显的优势(所有 p 值均<0.001)。
CNN 的卓越诊断性能表明,传统 CIA 不应继续作为辅助医生临床管理决策的手段。