Lee Minhyun, Lee Seungho, Lee Jongwuk, Shim Hyunjung
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12341-12357. doi: 10.1109/TPAMI.2023.3273592. Epub 2023 Sep 5.
Existing studies on semantic segmentation using image-level weak supervision have several limitations, including sparse object coverage, inaccurate object boundaries, and co-occurring pixels from non-target objects. To overcome these challenges, we propose a novel framework, an improved version of Explicit Pseudo-pixel Supervision (EPS++), which learns from pixel-level feedback by combining two types of weak supervision. Specifically, the image-level label provides the object identity via the localization map, and the saliency map from an off-the-shelf saliency detection model offers rich object boundaries. We devise a joint training strategy to fully utilize the complementary relationship between disparate information. Notably, we suggest an Inconsistent Region Drop (IRD) strategy, which effectively handles errors in saliency maps using fewer hyper-parameters than EPS. Our method can obtain accurate object boundaries and discard co-occurring pixels, significantly improving the quality of pseudo-masks. Experimental results show that EPS++ effectively resolves the key challenges of semantic segmentation using weak supervision, resulting in new state-of-the-art performances on three benchmark datasets in a weakly supervised semantic segmentation setting. Furthermore, we show that the proposed method can be extended to solve the semi-supervised semantic segmentation problem using image-level weak supervision. Surprisingly, the proposed model also achieves new state-of-the-art performances on two popular benchmark datasets.
现有的关于使用图像级弱监督进行语义分割的研究存在若干局限性,包括目标覆盖稀疏、目标边界不准确以及来自非目标对象的共现像素。为了克服这些挑战,我们提出了一种新颖的框架,即显式伪像素监督(EPS++)的改进版本,它通过结合两种类型的弱监督从像素级反馈中学习。具体而言,图像级标签通过定位图提供目标身份,而来自现成显著图检测模型的显著图提供丰富的目标边界。我们设计了一种联合训练策略,以充分利用不同信息之间的互补关系。值得注意的是,我们提出了一种不一致区域丢弃(IRD)策略,该策略使用比EPS更少的超参数有效处理显著图中的错误。我们的方法可以获得准确的目标边界并丢弃共现像素,显著提高伪掩码的质量。实验结果表明,EPS++有效解决了使用弱监督进行语义分割的关键挑战,在弱监督语义分割设置下在三个基准数据集上取得了新的最优性能。此外,我们表明所提出的方法可以扩展以解决使用图像级弱监督的半监督语义分割问题。令人惊讶的是,所提出的模型在两个流行的基准数据集上也取得了新的最优性能。