Jiang Peng-Tao, Han Ling-Hao, Hou Qibin, Cheng Ming-Ming, Wei Yunchao
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7062-7077. doi: 10.1109/TPAMI.2021.3092573. Epub 2022 Sep 14.
Object attention maps generated by image classifiers are usually used as priors for weakly supervised semantic segmentation. However, attention maps usually locate the most discriminative object parts. The lack of integral object localization maps heavily limits the performance of weakly supervised segmentation approaches. This paper attempts to investigate a novel way to identify entire object regions in a weakly supervised manner. We observe that image classifiers' attention maps at different training phases may focus on different parts of the target objects. Based on this observation, we propose an online attention accumulation (OAA) strategy that utilizes the attention maps at different training phases to obtain more integral object regions. Specifically, we maintain a cumulative attention map for each target category in each training image and utilize it to record the discovered object regions at different training phases. Albeit OAA can effectively mine more object regions for most images, for some training images, the range of the attention movement is not large, limiting the generation of integral object attention regions. To overcome this problem, we propose incorporating an attention drop layer into the online attention accumulation process to enlarge the range of attention movement during training explicitly. Our method (OAA) can be plugged into any classification network and progressively accumulate the discriminative regions into cumulative attention maps as the training process goes. Additionally, we also explore utilizing the final cumulative attention maps to serve as the pixel-level supervision, which can further assist the network in discovering more integral object regions. When applying the resulting attention maps to the weakly supervised semantic segmentation task, our approach improves the existing state-of-the-art methods on the PASCAL VOC 2012 segmentation benchmark, achieving a mIoU score of 67.2 percent on the test set.
图像分类器生成的目标注意力图通常用作弱监督语义分割的先验。然而,注意力图通常定位最具判别力的目标部分。缺乏完整的目标定位图严重限制了弱监督分割方法的性能。本文试图研究一种以弱监督方式识别整个目标区域的新方法。我们观察到图像分类器在不同训练阶段的注意力图可能聚焦于目标对象的不同部分。基于这一观察,我们提出了一种在线注意力积累(OAA)策略,该策略利用不同训练阶段的注意力图来获得更完整的目标区域。具体来说,我们为每个训练图像中的每个目标类别维护一个累积注意力图,并利用它记录不同训练阶段发现的目标区域。尽管OAA可以有效地为大多数图像挖掘更多目标区域,但对于一些训练图像,注意力移动的范围不大,限制了完整目标注意力区域的生成。为了克服这个问题,我们提出在在线注意力积累过程中加入一个注意力下降层,以明确扩大训练期间注意力移动的范围。我们的方法(OAA)可以插入任何分类网络,并随着训练过程的进行将判别区域逐步积累到累积注意力图中。此外,我们还探索利用最终的累积注意力图作为像素级监督,这可以进一步帮助网络发现更多完整的目标区域。当将得到的注意力图应用于弱监督语义分割任务时,我们的方法在PASCAL VOC 2012分割基准上改进了现有的最先进方法,在测试集上实现了67.2%的平均交并比得分。