IEEE Trans Image Process. 2020;29:128-141. doi: 10.1109/TIP.2019.2930874. Epub 2019 Jul 30.
Image semantic segmentation tasks and methods based on weakly supervised conditions have been proposed and achieve better and better performance in recent years. However, the purpose of these tasks is mainly to simplify the labeling work. In this paper, we establish a new and more challenging task condition: weaklier supervision with one image level annotation per category, which only provides prior knowledge that humans need to recognize new objects, and aims to achieve pixel-level object semantic understanding. In order to solve this problem, a three-stage semantic segmentation framework is put forward, which realizes image level, pixel level, and object common features learning from coarse to fine grade, and finally obtains semantic segmentation results with accurate and complete object regions. Researches on PASCAL VOC 2012 dataset demonstrates the effectiveness of the proposed method, which makes an obvious improvement compared to baselines. Based on fewer supervised information, the method also provides satisfactory performance compared to weakly supervised learning-based methods with complete image-level annotations.
基于弱监督条件的图像语义分割任务和方法近年来得到了提出并取得了越来越好的性能。然而,这些任务的目的主要是简化标注工作。在本文中,我们建立了一个新的、更具挑战性的任务条件:每个类别仅提供一张图像级别的标注,这仅提供了人类需要识别新对象的先验知识,并旨在实现像素级别的对象语义理解。为了解决这个问题,提出了一个三阶段的语义分割框架,从粗到细的逐步实现图像级、像素级和对象公共特征的学习,最终得到具有准确和完整对象区域的语义分割结果。在 PASCAL VOC 2012 数据集上的研究表明了所提出方法的有效性,与基线相比有明显的改进。基于较少的监督信息,与具有完整图像级标注的基于弱监督学习的方法相比,该方法也提供了令人满意的性能。