School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
Population Health Sciences Institute, School of Pharmacy, Newcastle University, Newcastle NE1 7RU, United Kingdom of Great Britain and Northern Ireland.
Artif Intell Med. 2024 Sep;155:102934. doi: 10.1016/j.artmed.2024.102934. Epub 2024 Jul 25.
Melanoma is a serious risk to human health and early identification is vital for treatment success. Deep learning (DL) has the potential to detect cancer using imaging technologies and many studies provide evidence that DL algorithms can achieve high accuracy in melanoma diagnostics.
To critically assess different DL performances in diagnosing melanoma using dermatoscopic images and discuss the relationship between dermatologists and DL.
Ovid-Medline, Embase, IEEE Xplore, and the Cochrane Library were systematically searched from inception until 7th December 2021. Studies that reported diagnostic DL model performances in detecting melanoma using dermatoscopic images were included if they had specific outcomes and histopathologic confirmation. Binary diagnostic accuracy data and contingency tables were extracted to analyze outcomes of interest, which included sensitivity (SEN), specificity (SPE), and area under the curve (AUC). Subgroup analyses were performed according to human-machine comparison and cooperation. The study was registered in PROSPERO, CRD42022367824.
2309 records were initially retrieved, of which 37 studies met our inclusion criteria, and 27 provided sufficient data for meta-analytical synthesis. The pooled SEN was 82 % (range 77-86), SPE was 87 % (range 84-90), with an AUC of 0.92 (range 0.89-0.94). Human-machine comparison had pooled AUCs of 0.87 (0.84-0.90) and 0.83 (0.79-0.86) for DL and dermatologists, respectively. Pooled AUCs were 0.90 (0.87-0.93), 0.80 (0.76-0.83), and 0.88 (0.85-0.91) for DL, and junior and senior dermatologists, respectively. Analyses of human-machine cooperation were 0.88 (0.85-0.91) for DL, 0.76 (0.72-0.79) for unassisted, and 0.87 (0.84-0.90) for DL-assisted dermatologists.
Evidence suggests that DL algorithms are as accurate as senior dermatologists in melanoma diagnostics. Therefore, DL could be used to support dermatologists in diagnostic decision-making. Although, further high-quality, large-scale multicenter studies are required to address the specific challenges associated with medical AI-based diagnostics.
黑色素瘤对人类健康构成严重威胁,早期发现对于治疗成功至关重要。深度学习(DL)具有利用成像技术检测癌症的潜力,许多研究表明 DL 算法可以在黑色素瘤诊断中实现高精度。
批判性评估使用皮肤镜图像诊断黑色素瘤的不同 DL 性能,并讨论皮肤科医生与 DL 之间的关系。
系统检索 Ovid-Medline、Embase、IEEE Xplore 和 Cochrane 图书馆,从成立到 2021 年 12 月 7 日。如果研究报告了使用皮肤镜图像检测黑色素瘤的特定 DL 模型性能,并具有特定的结果和组织病理学证实,则将其纳入。提取二元诊断准确性数据和列联表以分析感兴趣的结果,包括敏感性(SEN)、特异性(SPE)和曲线下面积(AUC)。根据人机比较和合作进行了亚组分析。该研究在 PROSPERO 中进行,CRD42022367824。
最初检索到 2309 条记录,其中 37 项研究符合纳入标准,27 项研究提供了足够的数据进行荟萃分析。汇总的 SEN 为 82%(范围为 77-86),SPE 为 87%(范围为 84-90),AUC 为 0.92(范围为 0.89-0.94)。人机比较的 DL 和皮肤科医生的 AUC 分别为 0.87(0.84-0.90)和 0.83(0.79-0.86)。DL、初级和高级皮肤科医生的汇总 AUC 分别为 0.90(0.87-0.93)、0.80(0.76-0.83)和 0.88(0.85-0.91)。人机合作的分析结果分别为 0.88(0.85-0.91)为 DL,0.76(0.72-0.79)为无辅助,0.87(0.84-0.90)为 DL 辅助皮肤科医生。
有证据表明,DL 算法在黑色素瘤诊断中与高级皮肤科医生一样准确。因此,DL 可以用于支持皮肤科医生进行诊断决策。尽管如此,还需要进一步开展高质量、大规模的多中心研究,以解决与基于 AI 的医学诊断相关的具体挑战。