Samimi Parnia, Ravana Sri Devi
Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia.
ScientificWorldJournal. 2014;2014:135641. doi: 10.1155/2014/135641. Epub 2014 May 19.
Test collection is used to evaluate the information retrieval systems in laboratory-based evaluation experimentation. In a classic setting, generating relevance judgments involves human assessors and is a costly and time consuming task. Researchers and practitioners are still being challenged in performing reliable and low-cost evaluation of retrieval systems. Crowdsourcing as a novel method of data acquisition is broadly used in many research fields. It has been proven that crowdsourcing is an inexpensive and quick solution as well as a reliable alternative for creating relevance judgments. One of the crowdsourcing applications in IR is to judge relevancy of query document pair. In order to have a successful crowdsourcing experiment, the relevance judgment tasks should be designed precisely to emphasize quality control. This paper is intended to explore different factors that have an influence on the accuracy of relevance judgments accomplished by workers and how to intensify the reliability of judgments in crowdsourcing experiment.
测试集用于在基于实验室的评估实验中评估信息检索系统。在传统环境中,生成相关性判断需要人工评估,这是一项成本高昂且耗时的任务。研究人员和从业人员在对检索系统进行可靠且低成本的评估方面仍面临挑战。众包作为一种新型的数据获取方法,在许多研究领域中得到了广泛应用。事实证明,众包是一种廉价、快速的解决方案,也是创建相关性判断的可靠替代方法。信息检索中的众包应用之一是判断查询文档对的相关性。为了成功进行众包实验,应精确设计相关性判断任务以强调质量控制。本文旨在探讨影响工作人员完成相关性判断准确性的不同因素,以及如何在众包实验中提高判断的可靠性。