IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1675-1688. doi: 10.1109/TNNLS.2017.2677468. Epub 2017 Mar 22.
Crowdsourcing systems provide a cost effective and convenient way to collect labels, but they often fail to guarantee the quality of the labels. This paper proposes a novel framework that introduces noise correction techniques to further improve the quality of integrated labels that are inferred from the multiple noisy labels of objects. In the proposed general framework, information about the qualities of labelers estimated by a front-end ground truth inference algorithm is utilized to supervise subsequent label noise filtering and correction. The framework uses a novel algorithm termed adaptive voting noise correction (AVNC) to precisely identify and correct the potential noisy labels. After filtering out the instances with noisy labels, the remaining cleansed data set is used to create multiple weak classifiers, based on which a powerful ensemble classifier is induced to correct these noises. Experimental results on eight simulated data sets with different kinds of features and two real-world crowdsourcing data sets in different domains consistently show that: 1) the proposed framework can improve label quality regardless of inference algorithms, especially under the circumstance that each instance has a few repeated labels and 2) since the proposed AVNC algorithm considers both the number of and the probability of potential label noises, it outperforms the state-of-the-art noise correction algorithms.
众包系统提供了一种经济有效的方法来收集标签,但它们往往无法保证标签的质量。本文提出了一种新颖的框架,该框架引入了噪声校正技术,以进一步提高从多个对象的嘈杂标签推断出的综合标签的质量。在所提出的通用框架中,利用前端真实推断算法估计的关于标注者质量的信息来监督后续的标签噪声过滤和校正。该框架使用一种称为自适应投票噪声校正 (AVNC) 的新颖算法来精确识别和校正潜在的嘈杂标签。过滤掉具有嘈杂标签的实例后,使用剩余的清理数据集创建多个弱分类器,然后基于这些弱分类器诱导出一个强大的集成分类器来纠正这些噪声。在具有不同特征的八个模拟数据集和两个不同领域的真实众包数据集上的实验结果一致表明:1) 无论推理算法如何,所提出的框架都可以提高标签质量,特别是在每个实例具有少量重复标签的情况下;2) 由于所提出的 AVNC 算法同时考虑了潜在标签噪声的数量和概率,因此它优于最先进的噪声校正算法。