Department of Dermatology, University of Heidelberg, Heidelberg, Germany.
Public, Private and Teaching Practice of Dermatology, Konstanz, Germany.
JAMA Dermatol. 2023 Jun 1;159(6):621-627. doi: 10.1001/jamadermatol.2023.0905.
Studies suggest that convolutional neural networks (CNNs) perform equally to trained dermatologists in skin lesion classification tasks. Despite the approval of the first neural networks for clinical use, prospective studies demonstrating benefits of human with machine cooperation are lacking.
To assess whether dermatologists benefit from cooperation with a market-approved CNN in classifying melanocytic lesions.
DESIGN, SETTING, AND PARTICIPANTS: In this prospective diagnostic 2-center study, dermatologists performed skin cancer screenings using naked-eye examination and dermoscopy. Dermatologists graded suspect melanocytic lesions by the probability of malignancy (range 0-1, threshold for malignancy ≥0.5) and indicated management decisions (no action, follow-up, excision). Next, dermoscopic images of suspect lesions were assessed by a market-approved CNN, Moleanalyzer Pro (FotoFinder Systems). The CNN malignancy scores (range 0-1, threshold for malignancy ≥0.5) were transferred to dermatologists with the request to re-evaluate lesions and revise initial decisions in consideration of CNN results. Reference diagnoses were based on histopathologic examination in 125 (54.8%) lesions or, in the case of nonexcised lesions, on clinical follow-up data and expert consensus. Data were collected from October 2020 to October 2021.
Primary outcome measures were diagnostic sensitivity and specificity of dermatologists alone and dermatologists cooperating with the CNN. Accuracy and receiver operator characteristic area under the curve (ROC AUC) were considered as additional measures.
A total of 22 dermatologists detected 228 suspect melanocytic lesions (190 nevi, 38 melanomas) in 188 patients (mean [range] age, 53.4 [19-91] years; 97 [51.6%] male patients). Diagnostic sensitivity and specificity significantly improved when dermatologists additionally integrated CNN results into decision-making (mean sensitivity from 84.2% [95% CI, 69.6%-92.6%] to 100.0% [95% CI, 90.8%-100.0%]; P = .03; mean specificity from 72.1% [95% CI, 65.3%-78.0%] to 83.7% [95% CI, 77.8%-88.3%]; P < .001; mean accuracy from 74.1% [95% CI, 68.1%-79.4%] to 86.4% [95% CI, 81.3%-90.3%]; P < .001; and mean ROC AUC from 0.895 [95% CI, 0.836-0.954] to 0.968 [95% CI, 0.948-0.988]; P = .005). In addition, the CNN alone achieved a comparable sensitivity, higher specificity, and higher diagnostic accuracy compared with dermatologists alone in classifying melanocytic lesions. Moreover, unnecessary excisions of benign nevi were reduced by 19.2%, from 104 (54.7%) of 190 benign nevi to 84 nevi when dermatologists cooperated with the CNN (P < .001). Most lesions were examined by dermatologists with 2 to 5 years (96, 42.1%) or less than 2 years of experience (78, 34.2%); others (54, 23.7%) were evaluated by dermatologists with more than 5 years of experience. Dermatologists with less dermoscopy experience cooperating with the CNN had the most diagnostic improvement compared with more experienced dermatologists.
In this prospective diagnostic study, these findings suggest that dermatologists may improve their performance when they cooperate with the market-approved CNN and that a broader application of this human with machine approach could be beneficial for dermatologists and patients.
研究表明,卷积神经网络 (CNN) 在皮肤病变分类任务中与经过训练的皮肤科医生表现相当。尽管第一个神经网络已获准用于临床,但缺乏前瞻性研究来证明人机合作的益处。
评估皮肤科医生在分类黑色素瘤病变时是否受益于与市场批准的 CNN 合作。
设计、地点和参与者:在这项前瞻性诊断的 2 中心研究中,皮肤科医生使用肉眼检查和皮肤镜进行皮肤癌筛查。皮肤科医生通过恶性肿瘤的可能性(范围 0-1,恶性肿瘤的阈值≥0.5)对可疑黑色素瘤病变进行分级,并表示管理决策(无行动、随访、切除)。接下来,对可疑病变的皮肤镜图像由市场批准的 CNN,Moleanalyzer Pro(FotoFinder Systems)进行评估。将 CNN 恶性肿瘤评分(范围 0-1,恶性肿瘤的阈值≥0.5)传输给皮肤科医生,并要求他们重新评估病变,并考虑到 CNN 结果修改初始决策。参考诊断基于 125 个(54.8%)病变的组织病理学检查,或者在未切除病变的情况下,基于临床随访数据和专家共识。数据收集时间为 2020 年 10 月至 2021 年 10 月。
主要的诊断指标是皮肤科医生单独和与 CNN 合作的皮肤科医生的诊断敏感性和特异性。准确性和接收者操作特征曲线(ROC AUC)下面积被认为是额外的措施。
共有 22 名皮肤科医生在 188 名患者(平均[范围]年龄,53.4[19-91]岁;97[51.6%]男性患者)中检测到 228 个可疑黑色素瘤病变(190 个痣,38 个黑素瘤)。当皮肤科医生将 CNN 结果纳入决策时,诊断敏感性和特异性显著提高(平均敏感性从 84.2%[95%CI,69.6%-92.6%]提高到 100.0%[95%CI,90.8%-100.0%];P=0.03;平均特异性从 72.1%[95%CI,65.3%-78.0%]提高到 83.7%[95%CI,77.8%-88.3%];P<0.001;平均准确性从 74.1%[95%CI,68.1%-79.4%]提高到 86.4%[95%CI,81.3%-90.3%];P<0.001;和平均 ROC AUC 从 0.895[95%CI,0.836-0.954]提高到 0.968[95%CI,0.948-0.988];P=0.005)。此外,与皮肤科医生单独分类黑色素瘤病变相比,CNN 本身的敏感性相当,特异性更高,诊断准确性更高。此外,通过与 CNN 合作,良性痣的不必要切除减少了 19.2%,从 190 个良性痣中的 104 个(54.7%)减少到 84 个(P<0.001)。大多数病变由具有 2 至 5 年(96,42.1%)或更短经验的皮肤科医生进行检查(78,34.2%);其他(54,23.7%)由经验超过 5 年的皮肤科医生评估。与经验丰富的皮肤科医生相比,与 CNN 合作的经验较少的皮肤科医生的诊断改善最大。
在这项前瞻性诊断研究中,这些发现表明,皮肤科医生在与市场批准的 CNN 合作时可能会提高他们的表现,并且这种人机合作的更广泛应用可能对皮肤科医生和患者有益。