School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
Vienna Dermatologic Imaging Research Group, Department of Dermatology, Medical University of Vienna, Vienna, Austria.
JAMA Dermatol. 2019 Jan 1;155(1):58-65. doi: 10.1001/jamadermatol.2018.4378.
Convolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions. However, the most common types of skin cancer are nonpigmented and nonmelanocytic, and are more difficult to diagnose.
To compare the accuracy of a CNN-based classifier with that of physicians with different levels of experience.
DESIGN, SETTING, AND PARTICIPANTS: A CNN-based classification model was trained on 7895 dermoscopic and 5829 close-up images of lesions excised at a primary skin cancer clinic between January 1, 2008, and July 13, 2017, for a combined evaluation of both imaging methods. The combined CNN (cCNN) was tested on a set of 2072 unknown cases and compared with results from 95 human raters who were medical personnel, including 62 board-certified dermatologists, with different experience in dermoscopy.
The proportions of correct specific diagnoses and the accuracy to differentiate between benign and malignant lesions measured as an area under the receiver operating characteristic curve served as main outcome measures.
Among 95 human raters (51.6% female; mean age, 43.4 years; 95% CI, 41.0-45.7 years), the participants were divided into 3 groups (according to years of experience with dermoscopy): beginner raters (<3 years), intermediate raters (3-10 years), or expert raters (>10 years). The area under the receiver operating characteristic curve of the trained cCNN was higher than human ratings (0.742; 95% CI, 0.729-0.755 vs 0.695; 95% CI, 0.676-0.713; P < .001). The specificity was fixed at the mean level of human raters (51.3%), and therefore the sensitivity of the cCNN (80.5%; 95% CI, 79.0%-82.1%) was higher than that of human raters (77.6%; 95% CI, 74.7%-80.5%). The cCNN achieved a higher percentage of correct specific diagnoses compared with human raters (37.6%; 95% CI, 36.6%-38.4% vs 33.5%; 95% CI, 31.5%-35.6%; P = .001) but not compared with experts (37.3%; 95% CI, 35.7%-38.8% vs 40.0%; 95% CI, 37.0%-43.0%; P = .18).
Neural networks are able to classify dermoscopic and close-up images of nonpigmented lesions as accurately as human experts in an experimental setting.
卷积神经网络(CNN)在色素性黑素细胞病变的诊断中达到了专家级的准确性。然而,最常见的皮肤癌是无色素和非黑素细胞的,更难诊断。
比较基于卷积神经网络的分类器与不同经验水平的医生的准确性。
设计、设置和参与者:一个基于 CNN 的分类模型在 2008 年 1 月 1 日至 2017 年 7 月 13 日期间在一家初级皮肤癌诊所切除的 7895 张皮肤镜和 5829 张特写图像上进行了训练,用于两种成像方法的综合评估。组合 CNN(cCNN)在一组 2072 个未知病例上进行了测试,并与来自 95 名具有不同皮肤镜经验的医务人员的 95 名人类评分者的结果进行了比较。
正确的特定诊断比例和区分良性和恶性病变的准确性作为主要的测量指标,以受试者工作特征曲线下的面积表示。
在 95 名人类评分者中(51.6%为女性;平均年龄 43.4 岁;95%置信区间为 41.0-45.7 岁),参与者被分为 3 组(根据皮肤镜检查的经验年限):初学者评分者(<3 年)、中级评分者(3-10 年)或专家评分者(>10 年)。经过训练的 cCNN 的受试者工作特征曲线下面积高于人类评分(0.742;95%置信区间,0.729-0.755 与 0.695;95%置信区间,0.676-0.713;P < .001)。特异性固定在人类评分者的平均水平(51.3%),因此 cCNN 的敏感性(80.5%;95%置信区间,79.0%-82.1%)高于人类评分者(77.6%;95%置信区间,74.7%-80.5%)。cCNN 与人类评分者相比,正确的特定诊断比例更高(37.6%;95%置信区间,36.6%-38.4%与 33.5%;95%置信区间,31.5%-35.6%;P = .001),但与专家相比则不然(37.3%;95%置信区间,35.7%-38.8%与 40.0%;95%置信区间,37.0%-43.0%;P = .18)。
神经网络能够在实验环境中准确地对非色素性病变的皮肤镜和特写图像进行分类,其准确性与人类专家相当。