Mitry Danny, Peto Tunde, Hayat Shabina, Blows Peter, Morgan James, Khaw Kay-Tee, Foster Paul J
NIHR Biomedical Research Centre, Moorfields Eye Hospital and UCL Institute of Ophthalmology, London, United Kingdom.
Department of Public Health and Primary Care, University of Cambridge, Strangeways Research Laboratory, Cambridge, United Kingdom.
PLoS One. 2015 Feb 18;10(2):e0117401. doi: 10.1371/journal.pone.0117401. eCollection 2015.
Crowdsourcing is the process of simplifying and outsourcing numerous tasks to many untrained individuals. Our aim was to assess the performance and repeatability of crowdsourcing in the classification of normal and glaucomatous discs from optic disc images.
Optic disc images (N = 127) with pre-determined disease status were selected by consensus agreement from grading experts from a large cohort study. After reading brief illustrative instructions, we requested that knowledge workers (KWs) from a crowdsourcing platform (Amazon MTurk) classified each image as normal or abnormal. Each image was classified 20 times by different KWs. Two study designs were examined to assess the effect of varying KW experience and both study designs were conducted twice for consistency. Performance was assessed by comparing the sensitivity, specificity and area under the receiver operating characteristic curve (AUC).
Overall, 2,540 classifications were received in under 24 hours at minimal cost. The sensitivity ranged between 83-88% across both trials and study designs, however the specificity was poor, ranging between 35-43%. In trial 1, the highest AUC (95%CI) was 0.64(0.62-0.66) and in trial 2 it was 0.63(0.61-0.65). There were no significant differences between study design or trials conducted.
Crowdsourcing represents a cost-effective method of image analysis which demonstrates good repeatability and a high sensitivity. Optimisation of variables such as reward schemes, mode of image presentation, expanded response options and incorporation of training modules should be examined to determine their effect on the accuracy and reliability of this technique in retinal image analysis.
众包是将众多任务简化并外包给许多未经培训的个人的过程。我们的目的是评估众包在根据视盘图像对正常和青光眼视盘进行分类时的性能和可重复性。
通过来自一项大型队列研究的分级专家的共识,从预先确定疾病状态的视盘图像(N = 127)中进行选择。在阅读简短的说明性指导后,我们要求来自众包平台(亚马逊土耳其机器人)的知识工作者(KW)将每张图像分类为正常或异常。每张图像由不同的KW分类20次。研究了两种研究设计以评估KW经验差异的影响,并且两种研究设计都进行了两次以确保一致性。通过比较敏感性、特异性和受试者操作特征曲线下面积(AUC)来评估性能。
总体而言,在24小时内以最低成本获得了2540次分类。在两项试验和研究设计中,敏感性在83% - 88%之间,但特异性较差,在35% - 43%之间。在试验1中,最高AUC(95%CI)为0.64(0.62 - 0.66),在试验2中为0.63(0.61 - 0.65)。研究设计或进行的试验之间没有显著差异。
众包是一种具有成本效益的图像分析方法,具有良好的可重复性和高敏感性。应研究奖励方案、图像呈现模式、扩展的响应选项和纳入培训模块等变量的优化,以确定它们对这种技术在视网膜图像分析中的准确性和可靠性的影响。