Transparency in Algorithms Group, CYENS - Centre of Excellence, Nicosia, Cyprus.
IMDEA Networks Institute, Leganés (Madrid), Spain.
PLoS One. 2021 Jun 16;16(6):e0252604. doi: 10.1371/journal.pone.0252604. eCollection 2021.
Crowdsourcing systems are evolving into a powerful tool of choice to deal with repetitive or lengthy human-based tasks. Prominent among those is Amazon Mechanical Turk, in which Human Intelligence Tasks, are posted by requesters, and afterwards selected and executed by subscribed (human) workers in the platform. Many times these HITs serve for research purposes. In this context, a very important question is how reliable the results obtained through these platforms are, in view of the limited control a requester has on the workers' actions. Various control techniques are currently proposed but they are not free from shortcomings, and their use must be accompanied by a deeper understanding of the workers' behavior. In this work, we attempt to interpret the workers' behavior and reliability level in the absence of control techniques. To do so, we perform a series of experiments with 600 distinct MTurk workers, specifically designed to elicit the worker's level of dedication to a task, according to the task's nature and difficulty. We show that the time required by a worker to carry out a task correlates with its difficulty, and also with the quality of the outcome. We find that there are different types of workers. While some of them are willing to invest a significant amount of time to arrive at the correct answer, at the same time we observe a significant fraction of workers that reply with a wrong answer. For the latter, the difficulty of the task and the very short time they took to reply suggest that they, intentionally, did not even attempt to solve the task.
众包系统正在发展成为处理重复或冗长的基于人工的任务的一种强大选择工具。其中最著名的是亚马逊 Mechanical Turk,在这个平台上,请求者发布人类智能任务,然后由订阅(人工)工人选择和执行。这些 HIT 很多时候都是为了研究目的。在这种情况下,一个非常重要的问题是,由于请求者对工人的行为控制有限,通过这些平台获得的结果的可靠性如何。目前提出了各种控制技术,但它们都不是没有缺点的,并且必须在对工人行为有更深入的了解的情况下使用。在这项工作中,我们试图在没有控制技术的情况下解释工人的行为和可靠性水平。为此,我们对 600 名不同的 MTurk 工人进行了一系列实验,这些实验是根据任务的性质和难度专门设计的,旨在引出工人对任务的投入程度。我们表明,工人完成任务所需的时间与其难度相关,也与其结果的质量相关。我们发现存在不同类型的工人。虽然其中一些工人愿意投入大量时间来得出正确答案,但同时我们也观察到相当一部分工人会给出错误答案。对于后者,任务的难度和他们回复的极短时间表明,他们故意甚至没有尝试解决任务。