Department of Dermatology, University of Heidelberg, Heidelberg, Germany.
Department of Dermatology, University of Heidelberg, Heidelberg, Germany.
Eur J Cancer. 2021 Feb;144:192-199. doi: 10.1016/j.ejca.2020.11.034. Epub 2020 Dec 25.
The clinical differentiation of face and scalp lesions (FSLs) is challenging even for trained dermatologists. Studies comparing the diagnostic performance of a convolutional neural network (CNN) with dermatologists in FSL are lacking.
A market-approved CNN (Moleanalyzer-Pro, FotoFinder Systems) was used for binary classifications of 100 dermoscopic images of FSL. The same lesions were used in a two-level reader study including 64 dermatologists (level I: dermoscopy only; level II: dermoscopy, clinical close-up images, textual information). Primary endpoints were the CNN's sensitivity and specificity in comparison with the dermatologists' management decisions in level II. Generalizability of the CNN results was tested by using four additional external data sets.
The CNN's sensitivity, specificity and ROC AUC were 96.2% [87.0%-98.9%], 68.8% [54.7%-80.1%] and 0.929 [0.880-0.978], respectively. In level II, the dermatologists' management decisions showed a mean sensitivity of 84.2% [82.2%-86.2%] and specificity of 69.4% [66.0%-72.8%]. When fixing the CNN's specificity at the dermatologists' mean specificity (69.4%), the CNN's sensitivity (96.2% [87.0%-98.9%]) was significantly higher than that of dermatologists (84.2% [82.2%-86.2%]; p < 0.001). Dermatologists of all training levels were outperformed by the CNN (all p < 0.001). In confirmation, the CNN's accuracy (83.0%) was significantly higher than dermatologists' accuracies in level II management decisions (all p < 0.001). The CNN's performance was largely confirmed in three additional external data sets but particularly showed a reduced specificity in one Australian data set including FSL on severely sun-damaged skin.
When applied as an assistant system, the CNN's higher sensitivity at an equivalent specificity may result in an improved early detection of face and scalp skin cancers.
即使是训练有素的皮肤科医生,对面部和头皮病变(FSL)的临床鉴别也具有挑战性。目前缺乏比较卷积神经网络(CNN)与皮肤科医生在 FSL 诊断性能的研究。
使用市场上认可的 CNN(Moleanalyzer-Pro, FotoFinder Systems)对 100 张 FSL 的皮肤镜图像进行二分类。相同的病变用于包括 64 名皮肤科医生的两级阅读者研究(一级:仅皮肤镜检查;二级:皮肤镜检查、临床特写图像、文本信息)。主要终点是 CNN 与二级皮肤科医生管理决策的敏感性和特异性比较。通过使用四个额外的外部数据集来测试 CNN 结果的泛化能力。
CNN 的敏感性、特异性和 ROC AUC 分别为 96.2%[87.0%-98.9%]、68.8%[54.7%-80.1%]和 0.929[0.880-0.978]。在二级中,皮肤科医生的管理决策显示出平均敏感性为 84.2%[82.2%-86.2%]和特异性为 69.4%[66.0%-72.8%]。当将 CNN 的特异性固定在皮肤科医生的平均特异性(69.4%)时,CNN 的敏感性(96.2%[87.0%-98.9%])明显高于皮肤科医生(84.2%[82.2%-86.2%];p<0.001)。所有培训水平的皮肤科医生都不如 CNN(均 p<0.001)。证实了,CNN 的准确性(83.0%)明显高于二级管理决策中皮肤科医生的准确性(均 p<0.001)。CNN 的性能在另外三个外部数据集上得到了很大的确认,但在一个包括严重日光损伤皮肤 FSL 的澳大利亚数据集上,特异性尤其降低。
当作为辅助系统应用时,CNN 在同等特异性下更高的敏感性可能会提高面部和头皮皮肤癌的早期检测率。