Division of Dermatology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
Division of Dermatology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
J Invest Dermatol. 2023 Aug;143(8):1423-1429.e1. doi: 10.1016/j.jid.2022.08.058. Epub 2023 Feb 18.
Artificial intelligence algorithms to classify melanoma are dependent on their training data, which limits generalizability. The objective of this study was to compare the performance of an artificial intelligence model trained on a standard adult-predominant dermoscopic dataset before and after the addition of additional pediatric training images. The performances were compared using held-out adult and pediatric test sets of images. We trained two models: one (model A) on an adult-predominant dataset (37,662 images from the International Skin Imaging Collaboration) and the other (model A+P) on an additional 1,536 pediatric images. We compared performance between the two models on adult and pediatric held-out test images separately using the area under the receiver operating characteristic curve. We then used Gradient-weighted Class Activation Maps and background skin masking to understand the contributions of the lesion versus background skin to algorithm decision making. Adding images from a pediatric population with different epidemiological and visual patterns to current reference standard datasets improved algorithm performance on pediatric images without diminishing performance on adult images. This suggests a way that dermatologic artificial intelligence models can be made more generalizable. The presence of background skin was important to the pediatric-specific improvement seen between models. Our study highlights the importance of carefully curated and labeled data from diverse inputs to improve the generalizability of AI models for dermatology, in this case applied to dermoscopic images of adult and pediatric lesions to improve melanoma detection.
用于分类黑色素瘤的人工智能算法依赖于其训练数据,这限制了其通用性。本研究的目的是比较在添加额外的儿科训练图像前后,基于标准成人为主的皮肤镜数据集训练的人工智能模型的性能。使用独立的成人和儿科图像测试集来比较性能。我们训练了两个模型:一个模型 A 基于成人为主的数据集(来自国际皮肤成像协作的 37662 张图像),另一个模型 A+P 基于额外的 1536 张儿科图像。我们分别使用接收者操作特征曲线下的面积来比较两个模型在成人和儿科独立测试图像上的性能。然后,我们使用梯度加权类激活映射和背景皮肤掩模来了解病变与背景皮肤对算法决策的贡献。将来自具有不同流行病学和视觉模式的儿科人群的图像添加到当前参考标准数据集中,可以提高儿科图像上算法的性能,而不会降低成人图像上的性能。这表明了一种使皮肤科人工智能模型更具通用性的方法。背景皮肤的存在对模型之间观察到的儿科特异性改善很重要。我们的研究强调了从不同输入精心策划和标记数据对于提高人工智能模型在皮肤科中的通用性的重要性,在这种情况下,它应用于成人和儿科病变的皮肤镜图像以提高黑色素瘤检测。