Tognetti Linda, Miracapillo Chiara, Leonardelli Simone, Luschi Alessio, Iadanza Ernesto, Cevenini Gabriele, Rubegni Pietro, Cartocci Alessandra
Dermatology Unit, Deparment of Medical, Surgical and Neurosciences, University of Siena, Viale Bracci 16, 53100 Siena, Italy.
Bioengineering and Biomedical Data Science Lab, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy.
Bioengineering (Basel). 2024 Jul 26;11(8):758. doi: 10.3390/bioengineering11080758.
There has been growing scientific interest in the research field of deep learning techniques applied to skin cancer diagnosis in the last decade. Though encouraging data have been globally reported, several discrepancies have been observed in terms of study methodology, result presentations and validation in clinical settings. The present review aimed to screen the scientific literature on the application of DL techniques to dermoscopic melanoma/nevi differential diagnosis and extrapolate those original studies adequately by reporting on a DL model, comparing them among clinicians and/or another DL architecture. The second aim was to examine those studies together according to a standard set of statistical measures, and the third was to provide dermatologists with a comprehensive explanation and definition of the most used artificial intelligence (AI) terms to better/further understand the scientific literature on this topic and, in parallel, to be updated on the newest applications in the medical dermatologic field, along with a historical perspective. After screening nearly 2000 records, a subset of 54 was selected. Comparing the 20 studies reporting on convolutional neural network (CNN)/deep convolutional neural network (DCNN) models, we have a scenario of highly performant DL algorithms, especially in terms of low false positive results, with average values of accuracy (83.99%), sensitivity (77.74%), and specificity (80.61%). Looking at the comparison with diagnoses by clinicians (13 studies), the main difference relies on the specificity values, with a +15.63% increase for the CNN/DCNN models (average specificity of 84.87%) compared to humans (average specificity of 64.24%) with a 14,85% gap in average accuracy; the sensitivity values were comparable (79.77% for DL and 79.78% for humans). To obtain higher diagnostic accuracy and feasibility in clinical practice, rather than in experimental retrospective settings, future DL models should be based on a large dataset integrating dermoscopic images with relevant clinical and anamnestic data that is prospectively tested and adequately compared with physicians.
在过去十年中,深度学习技术应用于皮肤癌诊断的研究领域引发了科学界越来越浓厚的兴趣。尽管全球都报告了令人鼓舞的数据,但在研究方法、结果呈现以及临床环境中的验证方面仍存在一些差异。本综述旨在筛选关于深度学习技术应用于皮肤镜下黑色素瘤/痣鉴别诊断的科学文献,并通过报告深度学习模型、在临床医生之间和/或与另一种深度学习架构进行比较,充分推断那些原始研究。第二个目标是根据一组标准的统计指标对这些研究进行综合审视,第三个目标是为皮肤科医生提供对最常用的人工智能(AI)术语的全面解释和定义,以便更好地/进一步理解关于该主题的科学文献,并同时了解医学皮肤科领域的最新应用以及历史背景。在筛选了近2000条记录后,选取了54篇作为子集。比较20项报告卷积神经网络(CNN)/深度卷积神经网络(DCNN)模型的研究,我们看到了高性能深度学习算法的情况,特别是在低假阳性结果方面,准确率平均值为83.99%,灵敏度为77.74%,特异性为80.61%。与临床医生的诊断结果进行比较(13项研究),主要差异在于特异性值,CNN/DCNN模型的特异性值提高了15.63%(平均特异性为84.87%),而人类的平均特异性为64.24%,平均准确率相差14.85%;灵敏度值相当(深度学习为79.77%,人类为79.78%)。为了在临床实践而非实验性回顾性环境中获得更高的诊断准确性和可行性,未来的深度学习模型应基于一个大型数据集,该数据集将皮肤镜图像与相关临床和病史数据整合在一起,并进行前瞻性测试,并与医生进行充分比较。