Heinlein Lukas, Maron Roman C, Hekler Achim, Haggenmüller Sarah, Wies Christoph, Utikal Jochen S, Meier Friedegund, Hobelsberger Sarah, Gellrich Frank F, Sergon Mildred, Hauschild Axel, French Lars E, Heinzerling Lucie, Schlager Justin G, Ghoreschi Kamran, Schlaak Max, Hilke Franz J, Poch Gabriela, Korsing Sören, Berking Carola, Heppt Markus V, Erdmann Michael, Haferkamp Sebastian, Drexler Konstantin, Schadendorf Dirk, Sondermann Wiebke, Goebeler Matthias, Schilling Bastian, Krieghoff-Henning Eva, Brinker Titus J
Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Medical Faculty, University Heidelberg, Heidelberg, Germany.
Commun Med (Lond). 2024 Sep 11;4(1):177. doi: 10.1038/s43856-024-00598-5.
Early detection of melanoma, a potentially lethal type of skin cancer with high prevalence worldwide, improves patient prognosis. In retrospective studies, artificial intelligence (AI) has proven to be helpful for enhancing melanoma detection. However, there are few prospective studies confirming these promising results. Existing studies are limited by low sample sizes, too homogenous datasets, or lack of inclusion of rare melanoma subtypes, preventing a fair and thorough evaluation of AI and its generalizability, a crucial aspect for its application in the clinical setting.
Therefore, we assessed "All Data are Ext" (ADAE), an established open-source ensemble algorithm for detecting melanomas, by comparing its diagnostic accuracy to that of dermatologists on a prospectively collected, external, heterogeneous test set comprising eight distinct hospitals, four different camera setups, rare melanoma subtypes, and special anatomical sites. We advanced the algorithm with real test-time augmentation (R-TTA, i.e., providing real photographs of lesions taken from multiple angles and averaging the predictions), and evaluated its generalization capabilities.
Overall, the AI shows higher balanced accuracy than dermatologists (0.798, 95% confidence interval (CI) 0.779-0.814 vs. 0.781, 95% CI 0.760-0.802; p = 4.0e-145), obtaining a higher sensitivity (0.921, 95% CI 0.900-0.942 vs. 0.734, 95% CI 0.701-0.770; p = 3.3e-165) at the cost of a lower specificity (0.673, 95% CI 0.641-0.702 vs. 0.828, 95% CI 0.804-0.852; p = 3.3e-165).
As the algorithm exhibits a significant performance advantage on our heterogeneous dataset exclusively comprising melanoma-suspicious lesions, AI may offer the potential to support dermatologists, particularly in diagnosing challenging cases.
黑色素瘤是一种在全球范围内具有高患病率的潜在致命性皮肤癌,早期检测可改善患者预后。在回顾性研究中,人工智能(AI)已被证明有助于增强黑色素瘤的检测。然而,很少有前瞻性研究证实这些有前景的结果。现有研究受到样本量小、数据集过于同质化或缺乏罕见黑色素瘤亚型纳入的限制,无法对AI及其通用性进行公平、全面的评估,而通用性是其在临床环境中应用的关键方面。
因此,我们通过将“所有数据均已扩展”(ADAE)这一用于检测黑色素瘤的成熟开源集成算法与皮肤科医生在一个前瞻性收集的、外部的、异质性测试集上的诊断准确性进行比较,来评估该算法。该测试集包括八家不同的医院、四种不同的相机设置、罕见的黑色素瘤亚型以及特殊的解剖部位。我们通过实时测试增强(R-TTA,即提供从多个角度拍摄的病变真实照片并对预测结果进行平均)来改进该算法,并评估其泛化能力。
总体而言,AI显示出比皮肤科医生更高的平衡准确率(0.798,95%置信区间(CI)0.779 - 0.814对比0.781,95%CI 0.760 - 0.802;p = 4.0e - 145),在特异性较低的代价下获得了更高的灵敏度(0.921,95%CI 0.900 - 0.942对比0.734,95%CI 0.701 - 0.770;p = 3.3e - 165)(0.673,95%CI 0.641 - 0.702对比0.828,95%CI 0.804 - 0.852;p = 3.3e - 165)。
由于该算法在我们专门包含可疑黑色素瘤病变的异质性数据集上表现出显著的性能优势,AI可能有潜力支持皮肤科医生,特别是在诊断具有挑战性的病例时。