IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2866-2873. doi: 10.1109/TPAMI.2020.3046486. Epub 2021 Jul 1.
The advances made in predicting visual saliency using deep neural networks come at the expense of collecting large-scale annotated data. However, pixel-wise annotation is labor-intensive and overwhelming. In this paper, we propose to learn saliency prediction from a single noisy labelling, which is easy to obtain (e.g., from imperfect human annotation or from unsupervised saliency prediction methods). With this goal, we address a natural question: Can we learn saliency prediction while identifying clean labels in a unified framework? To answer this question, we call on the theory of robust model fitting and formulate deep saliency prediction from a single noisy labelling as robust network learning and exploit model consistency across iterations to identify inliers and outliers (i.e., noisy labels). Extensive experiments on different benchmark datasets demonstrate the superiority of our proposed framework, which can learn comparable saliency prediction with state-of-the-art fully supervised saliency methods. Furthermore, we show that simply by treating ground truth annotations as noisy labelling, our framework achieves tangible improvements over state-of-the-art methods.
使用深度神经网络进行视觉显著性预测的进展是以收集大规模标注数据为代价的。然而,像素级别的标注是劳动密集型且繁重的。在本文中,我们提出从单个嘈杂的标注中学习显著性预测,这种标注很容易获得(例如,来自不完美的人工标注或来自无监督的显著性预测方法)。为了实现这一目标,我们提出了一个自然的问题:我们能否在统一的框架中同时学习显著性预测和识别干净的标签?为了回答这个问题,我们援引了稳健模型拟合的理论,并将从单个嘈杂标注中进行深度显著性预测表述为稳健网络学习,并利用迭代过程中的模型一致性来识别内点和外点(即嘈杂标注)。在不同的基准数据集上进行的广泛实验表明,我们提出的框架具有优越性,它可以用最先进的完全监督显著性方法进行可比的显著性预测。此外,我们还表明,仅将地面真实标注视为嘈杂的标注,我们的框架就可以显著优于最先进的方法。