Department of Psychological and Brain Sciences, Indiana University, 1101 E. 10th St., Bloomington, IN, 47405-7007, USA.
Cognitive Science Program, Indiana University, Bloomington, USA.
Cogn Res Princ Implic. 2024 May 20;9(1):31. doi: 10.1186/s41235-024-00558-6.
A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.
医疗人工智能(AI)的一个关键瓶颈是高质量的标注医疗数据集。在本文中,我们测试了多种众包算法,以对通过基于应用程序的平台招募的人员最初分类的医学图像进行标注。这些人员将国际皮肤损伤挑战赛 2018 中的皮肤损伤分类为 7 个不同类别。招募人员的地理位置、经验、培训和表现存在很大差异。我们测试了几种不同复杂度的众包算法,从简单的无权重平均值到更复杂的贝叶斯模型,这些模型考虑了个体错误模式。通过使用交换分析,我们观察到表现最好的算法依赖于选择表现最好的人员,根据训练准确性对决策进行加权,并考虑任务环境。这些算法远远超过了专家表现。最后,我们讨论了这些方法对医疗 AI 发展的意义。