Heim Eric, Seitel Alexander, Andrulis Jonas, Isensee Fabian, Stock Christian, Ross Tobias, Maier-Hein Lena
IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):2814-2826. doi: 10.1109/TPAMI.2017.2777967. Epub 2017 Nov 27.
With the rapidly increasing interest in machine learning based solutions for automatic image annotation, the availability of reference annotations for algorithm training is one of the major bottlenecks in the field. Crowdsourcing has evolved as a valuable option for low-cost and large-scale data annotation; however, quality control remains a major issue which needs to be addressed. To our knowledge, we are the first to analyze the annotation process to improve crowd-sourced image segmentation. Our method involves training a regressor to estimate the quality of a segmentation from the annotator's clickstream data. The quality estimation can be used to identify spam and weight individual annotations by their (estimated) quality when merging multiple segmentations of one image. Using a total of 29,000 crowd annotations performed on publicly available data of different object classes, we show that (1) our method is highly accurate in estimating the segmentation quality based on clickstream data, (2) outperforms state-of-the-art methods for merging multiple annotations. As the regressor does not need to be trained on the object class that it is applied to it can be regarded as a low-cost option for quality control and confidence analysis in the context of crowd-based image annotation.
随着基于机器学习的自动图像标注解决方案的兴趣迅速增长,算法训练的参考标注的可用性是该领域的主要瓶颈之一。众包已发展成为一种低成本、大规模数据标注的宝贵选择;然而,质量控制仍然是一个需要解决的主要问题。据我们所知,我们是第一个分析标注过程以改进众包图像分割的。我们的方法包括训练一个回归器,从标注者的点击流数据估计分割的质量。质量估计可用于识别垃圾信息,并在合并一幅图像的多个分割时,根据其(估计的)质量对各个标注进行加权。使用总共29000个对不同对象类别的公开可用数据进行的众包标注,我们表明:(1)我们的方法在基于点击流数据估计分割质量方面非常准确;(2)在合并多个标注方面优于现有方法。由于回归器不需要在其应用的对象类上进行训练,因此在基于众包的图像标注的背景下,它可以被视为质量控制和置信度分析的低成本选择。