游戏化众包作为一种新型的肺部超声数据集标注方法：前瞻性分析。

Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Data Set Labeling: Prospective Analysis.

机构信息

Department of Emergency Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States.

Centaur Labs, Boston, MA, United States.

出版信息

J Med Internet Res. 2024 Jul 4;26:e51397. doi: 10.2196/51397.

DOI:10.2196/51397

PMID:38963923

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11258523/

Abstract

BACKGROUND

Machine learning (ML) models can yield faster and more accurate medical diagnoses; however, developing ML models is limited by a lack of high-quality labeled training data. Crowdsourced labeling is a potential solution but can be constrained by concerns about label quality.

OBJECTIVE

This study aims to examine whether a gamified crowdsourcing platform with continuous performance assessment, user feedback, and performance-based incentives could produce expert-quality labels on medical imaging data.

METHODS

In this diagnostic comparison study, 2384 lung ultrasound clips were retrospectively collected from 203 emergency department patients. A total of 6 lung ultrasound experts classified 393 of these clips as having no B-lines, one or more discrete B-lines, or confluent B-lines to create 2 sets of reference standard data sets (195 training clips and 198 test clips). Sets were respectively used to (1) train users on a gamified crowdsourcing platform and (2) compare the concordance of the resulting crowd labels to the concordance of individual experts to reference standards. Crowd opinions were sourced from DiagnosUs (Centaur Labs) iOS app users over 8 days, filtered based on past performance, aggregated using majority rule, and analyzed for label concordance compared with a hold-out test set of expert-labeled clips. The primary outcome was comparing the labeling concordance of collated crowd opinions to trained experts in classifying B-lines on lung ultrasound clips.

RESULTS

Our clinical data set included patients with a mean age of 60.0 (SD 19.0) years; 105 (51.7%) patients were female and 114 (56.1%) patients were White. Over the 195 training clips, the expert-consensus label distribution was 114 (58%) no B-lines, 56 (29%) discrete B-lines, and 25 (13%) confluent B-lines. Over the 198 test clips, expert-consensus label distribution was 138 (70%) no B-lines, 36 (18%) discrete B-lines, and 24 (12%) confluent B-lines. In total, 99,238 opinions were collected from 426 unique users. On a test set of 198 clips, the mean labeling concordance of individual experts relative to the reference standard was 85.0% (SE 2.0), compared with 87.9% crowdsourced label concordance (P=.15). When individual experts' opinions were compared with reference standard labels created by majority vote excluding their own opinion, crowd concordance was higher than the mean concordance of individual experts to reference standards (87.4% vs 80.8%, SE 1.6 for expert concordance; P<.001). Clips with discrete B-lines had the most disagreement from both the crowd consensus and individual experts with the expert consensus. Using randomly sampled subsets of crowd opinions, 7 quality-filtered opinions were sufficient to achieve near the maximum crowd concordance.

CONCLUSIONS

Crowdsourced labels for B-line classification on lung ultrasound clips via a gamified approach achieved expert-level accuracy. This suggests a strategic role for gamified crowdsourcing in efficiently generating labeled image data sets for training ML systems.

摘要

背景

机器学习（ML）模型可以更快、更准确地做出医疗诊断；然而，由于缺乏高质量的标注训练数据，ML 模型的开发受到限制。众包标注是一种潜在的解决方案，但可能会受到对标签质量的担忧的限制。

目的

本研究旨在检验一个具有连续绩效评估、用户反馈和基于绩效的激励的游戏化众包平台是否可以在医学影像数据上产生专家级别的标签。

方法

在这项诊断比较研究中，从 203 名急诊科患者中回顾性收集了 2384 个肺部超声剪辑。共有 6 名肺部超声专家将其中的 393 个剪辑分类为无 B 线、一个或多个离散 B 线或融合 B 线，以创建 2 组参考标准数据集（195 个训练剪辑和 198 个测试剪辑）。这些数据集分别用于（1）在游戏化众包平台上培训用户，以及（2）比较由此产生的众包标签与个别专家对参考标准的一致性。众包意见来自 Centaur Labs 的 DiagnosUs（Centaur Labs）iOS 应用程序用户，使用基于过去表现的过滤器，使用多数规则进行汇总，并分析与专家标记剪辑的测试集的标签一致性。主要结果是比较整理后的众包意见与在肺部超声剪辑上分类 B 线的训练专家的标注一致性。

结果

我们的临床数据集包括平均年龄为 60.0（标准差 19.0）岁的患者；105 名（51.7%）患者为女性，114 名（56.1%）患者为白人。在 195 个训练剪辑中，专家共识标签分布为 114 个（58%）无 B 线，56 个（29%）离散 B 线和 25 个（13%）融合 B 线。在 198 个测试剪辑中，专家共识标签分布为 138 个（70%）无 B 线，36 个（18%）离散 B 线和 24 个（12%）融合 B 线。总共从 426 个不同的用户那里收集了 99238 条意见。在 198 个剪辑的测试集中，相对于参考标准，个别专家的平均标注一致性为 85.0%（SE 2.0），而众包标签一致性为 87.9%（P=.15）。当将个别专家的意见与排除其自身意见的多数投票创建的参考标准标签进行比较时，众包一致性高于个别专家对参考标准的平均一致性（87.4%对 80.8%，专家一致性的 SE 为 1.6；P<.001）。离散 B 线的剪辑与来自众包共识和个别专家的意见都有最多的分歧。使用随机抽样的众包意见子集，7 条经过质量过滤的意见就足以达到接近最高的众包一致性。

结论

通过游戏化方法对肺部超声剪辑上的 B 线分类进行众包标注，达到了专家级别的准确性。这表明游戏化众包在为训练 ML 系统生成标注图像数据集方面具有战略作用。

相似文献

Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Data Set Labeling: Prospective Analysis.

J Med Internet Res. 2024 Jul 4;26:e51397. doi: 10.2196/51397.

Prescription of Controlled Substances: Benefits and Risks

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

Sexual Harassment and Prevention Training

Variation within and between digital pathology and light microscopy for the diagnosis of histopathology slides: blinded crossover comparison study.

Health Technol Assess. 2025 Jul;29(30):1-75. doi: 10.3310/SPLK4325.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

Transabdominal ultrasound and endoscopic ultrasound for diagnosis of gallbladder polyps.

Cochrane Database Syst Rev. 2018 Aug 15;8(8):CD012233. doi: 10.1002/14651858.CD012233.pub2.

引用本文的文献

Mapping artificial intelligence models in emergency medicine: A scoping review on artificial intelligence performance in emergency care and education.

Turk J Emerg Med. 2025 Apr 1;25(2):67-91. doi: 10.4103/tjem.tjem_45_25. eCollection 2025 Apr-Jun.

本文引用的文献

Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study.

JMIR Med Inform. 2023 Jan 18;11:e38412. doi: 10.2196/38412.

Transfer Learning for Automated COVID-19 B-Line Classification in Lung Ultrasound.

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1675-1681. doi: 10.1109/EMBC48229.2022.9871894.

New International Guidelines and Consensus on the Use of Lung Ultrasound.

J Ultrasound Med. 2023 Feb;42(2):309-344. doi: 10.1002/jum.16088. Epub 2022 Aug 22.

Review of Machine Learning in Lung Ultrasound in COVID-19 Pandemic.

J Imaging. 2022 Mar 5;8(3):65. doi: 10.3390/jimaging8030065.

Interobserver Agreement and Correlation of an Automated Algorithm for B-Line Identification and Quantification With Expert Sonologist Review in a Handheld Ultrasound Device.

J Ultrasound Med. 2022 Oct;41(10):2487-2495. doi: 10.1002/jum.15935. Epub 2021 Dec 28.

Automation of Lung Ultrasound Interpretation via Deep Learning for the Classification of Normal versus Abnormal Lung Parenchyma: A Multicenter Study.

Diagnostics (Basel). 2021 Nov 4;11(11):2049. doi: 10.3390/diagnostics11112049.

CheXED: Comparison of a Deep Learning Model to a Clinical Decision Support System for Pneumonia in the Emergency Department.

J Thorac Imaging. 2022 May 1;37(3):162-167. doi: 10.1097/RTI.0000000000000622. Epub 2021 Sep 23.

Lung Ultrasound-Guided Emergency Department Management of Acute Heart Failure (BLUSHED-AHF): A Randomized Controlled Pilot Trial.

JACC Heart Fail. 2021 Sep;9(9):638-648. doi: 10.1016/j.jchf.2021.05.008. Epub 2021 Jul 7.

Inter-observer reliability for different point-of-care lung ultrasound findings in mechanically ventilated critically ill COVID-19 patients.

J Clin Monit Comput. 2022 Feb;36(1):279-281. doi: 10.1007/s10877-021-00726-9. Epub 2021 May 29.

Using Gamification to Promote Accurate Data Entry of Practicum Experience Hours in Graduate Students.

Behav Anal Pract. 2020 Jun 2;14(1):1-10. doi: 10.1007/s40617-020-00421-2. eCollection 2021 Mar.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

游戏化众包作为一种新型的肺部超声数据集标注方法：前瞻性分析。

Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Data Set Labeling: Prospective Analysis.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献