Southern Swedish Forest Research Centre, Swedish University of Agricultural Sciences, Alnarp, Sweden.
Faculty of International Studies, Utsunomiya University, Utsunomiya, Japan.
PLoS One. 2022 May 19;17(5):e0267114. doi: 10.1371/journal.pone.0267114. eCollection 2022.
Involving members of the public in image classification tasks that can be tricky to automate is increasingly recognized as a way to complete large amounts of these tasks and promote citizen involvement in science. While this labor is usually provided for free, it is still limited, making it important for researchers to use volunteer contributions as efficiently as possible. Using volunteer labor efficiently becomes complicated when individual tasks are assigned to multiple volunteers to increase confidence that the correct classification has been reached. In this paper, we develop a system to decide when enough information has been accumulated to confidently declare an image to be classified and remove it from circulation. We use a Bayesian approach to estimate the posterior distribution of the mean rating in a binary image classification task. Tasks are removed from circulation when user-defined certainty thresholds are reached. We demonstrate this process using a set of over 4.5 million unique classifications by 2783 volunteers of over 190,000 images assessed for the presence/absence of cropland. If the system outlined here had been implemented in the original data collection campaign, it would have eliminated the need for 59.4% of volunteer ratings. Had this effort been applied to new tasks, it would have allowed an estimated 2.46 times as many images to have been classified with the same amount of labor, demonstrating the power of this method to make more efficient use of limited volunteer contributions. To simplify implementation of this method by other investigators, we provide cutoff value combinations for one set of confidence levels.
将公众成员纳入到难以实现自动化的图像分类任务中,已逐渐被视为完成大量此类任务和促进公民参与科学的一种方式。虽然此类劳动通常是无偿的,但它仍然是有限的,因此研究人员有必要尽可能有效地利用志愿者的贡献。当将单个任务分配给多个志愿者以提高对正确分类的信心时,有效地利用志愿者劳动就变得复杂了。在本文中,我们开发了一种系统,用于确定何时已经积累了足够的信息,可以有把握地宣布对图像进行分类并将其从流通中删除。我们使用贝叶斯方法来估计二进制图像分类任务中平均值的后验分布。当达到用户定义的置信度阈值时,任务将从流通中删除。我们使用一组超过 450 万个独特分类和 2783 名志愿者对超过 190000 张图像进行评估,以确定是否存在农田,来演示此过程。如果在此处概述的系统已经在原始数据收集活动中实施,那么它将消除 59.4%的志愿者评级的需求。如果将此努力应用于新任务,那么它将允许使用相同数量的劳动力对大约 2.46 倍的图像进行分类,从而证明了这种方法在更有效地利用有限的志愿者贡献方面的强大功能。为了简化其他研究人员对这种方法的实施,我们为一组置信水平提供了截止值组合。