专家与未经培训的人群使用游戏化应用程序识别皮肤镜特征的一致性：读者可行性研究

Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study.

作者信息

Kentley Jonathan, Weber Jochen, Liopyris Konstantinos, Braun Ralph P, Marghoob Ashfaq A, Quigley Elizabeth A, Nelson Kelly, Prentice Kira, Duhaime Erik, Halpern Allan C, Rotemberg Veronica

机构信息

Department of Dermatology, Chelsea and Westminster Hospital, London, United Kingdom.

Dermatology Section, Memorial Sloan Kettering Cancer Center, New York, NY, United States.

出版信息

JMIR Med Inform. 2023 Jan 18;11:e38412. doi: 10.2196/38412.

DOI:10.2196/38412

PMID:36652282

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9892985/

Abstract

BACKGROUND

Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsourcing has been demonstrated as a cost- and time-efficient method for the annotation of medical images.

OBJECTIVE

The aim of this study is to demonstrate that crowdsourcing can be used to label basic dermoscopic structures from images of pigmented lesions with similar reliability to a group of experts.

METHODS

First, we obtained labels of 248 images of melanocytic lesions with 31 dermoscopic "subfeatures" labeled by 20 dermoscopy experts. These were then collapsed into 6 dermoscopic "superfeatures" based on structural similarity, due to low interrater reliability (IRR): dots, globules, lines, network structures, regression structures, and vessels. These images were then used as the gold standard for the crowd study. The commercial platform DiagnosUs was used to obtain annotations from a nonexpert crowd for the presence or absence of the 6 superfeatures in each of the 248 images. We replicated this methodology with a group of 7 dermatologists to allow direct comparison with the nonexpert crowd. The Cohen κ value was used to measure agreement across raters.

RESULTS

In total, we obtained 139,731 ratings of the 6 dermoscopic superfeatures from the crowd. There was relatively lower agreement for the identification of dots and globules (the median κ values were 0.526 and 0.395, respectively), whereas network structures and vessels showed the highest agreement (the median κ values were 0.581 and 0.798, respectively). This pattern was also seen among the expert raters, who had median κ values of 0.483 and 0.517 for dots and globules, respectively, and 0.758 and 0.790 for network structures and vessels. The median κ values between nonexperts and thresholded average-expert readers were 0.709 for dots, 0.719 for globules, 0.714 for lines, 0.838 for network structures, 0.818 for regression structures, and 0.728 for vessels.

CONCLUSIONS

This study confirmed that IRR for different dermoscopic features varied among a group of experts; a similar pattern was observed in a nonexpert crowd. There was good or excellent agreement for each of the 6 superfeatures between the crowd and the experts, highlighting the similar reliability of the crowd for labeling dermoscopic images. This confirms the feasibility and dependability of using crowdsourcing as a scalable solution to annotate large sets of dermoscopic images, with several potential clinical and educational applications, including the development of novel, explainable ML tools.

摘要

背景

皮肤镜检查常用于色素性皮损的评估，但已知专家之间对皮肤镜结构识别的一致性相对较差。医学数据的专家标注是机器学习（ML）工具开发的一个瓶颈，而众包已被证明是一种经济高效的医学图像标注方法。

目的

本研究旨在证明众包可用于对色素性皮损图像中的基本皮肤镜结构进行标注，其可靠性与一组专家相当。

方法

首先，我们获得了20位皮肤镜专家对248张黑素细胞性皮损图像的标注，其中包含31个皮肤镜“子特征”。由于评分者间信度（IRR）较低，这些子特征随后基于结构相似性被合并为6个皮肤镜“超特征”：点状、球状、线状、网状结构、退行性结构和血管。然后，这些图像被用作人群研究的金标准。使用商业平台DiagnosUs从非专业人群中获取对248张图像中每个图像6个超特征存在与否的标注。我们对7位皮肤科医生重复了这一方法，以便与非专业人群进行直接比较。使用Cohen κ值来衡量评分者之间的一致性。

结果

我们总共从人群中获得了139731次对6个皮肤镜超特征的评分。在点状和球状的识别上一致性相对较低（中位数κ值分别为0.526和0.395），而网状结构和血管显示出最高的一致性（中位数κ值分别为0.581和0.798）。在专家评分者中也观察到了这种模式，他们对点状和球状的中位数κ值分别为0.483和0.517，对网状结构和血管的中位数κ值分别为0.758和0.790。非专业人群与经过阈值处理的平均专家读者之间的中位数κ值，点状为0.709，球状为0.719，线状为0.714，网状结构为0.838，退行性结构为0.818，血管为0.728。

结论

本研究证实，一组专家对不同皮肤镜特征的IRR各不相同；在非专业人群中也观察到了类似模式。人群与专家对6个超特征中的每一个都有良好或极好的一致性，突出了人群在标注皮肤镜图像方面的相似可靠性。这证实了使用众包作为一种可扩展的解决方案来标注大量皮肤镜图像的可行性和可靠性，具有多种潜在的临床和教育应用，包括开发新型的、可解释的ML工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be30/9892985/3a0093295e14/medinform_v11i1e38412_fig1.jpg

相似文献

Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study.

JMIR Med Inform. 2023 Jan 18;11:e38412. doi: 10.2196/38412.

Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Data Set Labeling: Prospective Analysis.

J Med Internet Res. 2024 Jul 4;26:e51397. doi: 10.2196/51397.

Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study.

JMIR Dermatol. 2023 Dec 26;6:e48589. doi: 10.2196/48589.

Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd.

Pac Symp Biocomput. 2015:294-305. doi: 10.1142/9789814644730_0029.

Amelanotic/hypomelanotic melanoma: clinical and dermoscopic features.

Br J Dermatol. 2004 Jun;150(6):1117-24. doi: 10.1111/j.1365-2133.2004.05928.x.

Clinical and Histopathologic Characteristics of Melanocytic Lesions on the Volar Skin Without Typical Dermoscopic Patterns.

JAMA Dermatol. 2019 May 1;155(5):578-584. doi: 10.1001/jamadermatol.2018.5926.

Reproducibility of dermoscopic features of congenital melanocytic nevi.

Dermatology. 2008;217(3):231-4. doi: 10.1159/000148249. Epub 2008 Jul 25.

Exploring dermoscopic structures for melanoma lesions' classification.

Front Big Data. 2024 Mar 25;7:1366312. doi: 10.3389/fdata.2024.1366312. eCollection 2024.

Expert Agreement on the Presence and Spatial Localization of Melanocytic Features in Dermoscopy.

J Invest Dermatol. 2024 Mar;144(3):531-539.e13. doi: 10.1016/j.jid.2023.01.045. Epub 2023 Sep 7.

Deep learning-based, computer-aided classifier developed with dermoscopic images shows comparable performance to 164 dermatologists in cutaneous disease diagnosis in the Chinese population.

Chin Med J (Engl). 2020 Sep 5;133(17):2027-2036. doi: 10.1097/CM9.0000000000001023.

引用本文的文献

Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Data Set Labeling: Prospective Analysis.

J Med Internet Res. 2024 Jul 4;26:e51397. doi: 10.2196/51397.

Boosting wisdom of the crowd for medical image annotation using training performance and task features.

Cogn Res Princ Implic. 2024 May 20;9(1):31. doi: 10.1186/s41235-024-00558-6.

Training Family Medicine Residents in Dermoscopy Using an e-Learning Course: Pilot Interventional Study.

JMIR Form Res. 2024 May 13;8:e56005. doi: 10.2196/56005.

Real-time near infrared artificial intelligence using scalable non-expert crowdsourcing in colorectal surgery.

NPJ Digit Med. 2024 Apr 22;7(1):99. doi: 10.1038/s41746-024-01095-8.

Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study.

JMIR Dermatol. 2023 Dec 26;6:e48589. doi: 10.2196/48589.

本文引用的文献

A deep learning approach to detect blood vessels in basal cell carcinoma.

Skin Res Technol. 2022 Jul;28(4):571-576. doi: 10.1111/srt.13150. Epub 2022 May 25.

Biological data annotation via a human-augmenting AI-based labeling system.

NPJ Digit Med. 2021 Oct 7;4(1):145. doi: 10.1038/s41746-021-00520-6.

Beware explanations from AI in health care.

Science. 2021 Jul 16;373(6552):284-286. doi: 10.1126/science.abg1834.

A patient-centric dataset of images and metadata for identifying melanomas using clinical context.

Sci Data. 2021 Jan 28;8(1):34. doi: 10.1038/s41597-021-00815-z.

Crowdsourcing in health and medical research: a systematic review.

Infect Dis Poverty. 2020 Jan 20;9(1):8. doi: 10.1186/s40249-020-0622-9.

Systematic review of machine learning for diagnosis and prognosis in dermatology.

J Dermatolog Treat. 2020 Aug;31(5):496-510. doi: 10.1080/09546634.2019.1682500. Epub 2019 Oct 31.

Potential Liability for Physicians Using Artificial Intelligence.

JAMA. 2019 Nov 12;322(18):1765-1766. doi: 10.1001/jama.2019.15064.

The Possibility of Deep Learning-Based, Computer-Aided Skin Tumor Classifiers.

Front Med (Lausanne). 2019 Aug 27;6:191. doi: 10.3389/fmed.2019.00191. eCollection 2019.

The role of public challenges and data sets towards algorithm development, trust, and use in clinical practice.

Semin Cutan Med Surg. 2019 Mar 1;38(1):E38-E42. doi: 10.12788/j.sder.2019.013.

Large-scale medical image annotation with crowd-powered algorithms.

J Med Imaging (Bellingham). 2018 Jul;5(3):034002. doi: 10.1117/1.JMI.5.3.034002. Epub 2018 Sep 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

专家与未经培训的人群使用游戏化应用程序识别皮肤镜特征的一致性：读者可行性研究

Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study.

作者信息

Kentley Jonathan, Weber Jochen, Liopyris Konstantinos, Braun Ralph P, Marghoob Ashfaq A, Quigley Elizabeth A, Nelson Kelly, Prentice Kira, Duhaime Erik, Halpern Allan C, Rotemberg Veronica

机构信息

Department of Dermatology, Chelsea and Westminster Hospital, London, United Kingdom.

Dermatology Section, Memorial Sloan Kettering Cancer Center, New York, NY, United States.