Choe Junsuk, Lee Seungho, Shim Hyunjung
IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4256-4271. doi: 10.1109/TPAMI.2020.2999099. Epub 2021 Nov 3.
Both weakly supervised single object localization and semantic segmentation techniques learn an object's location using only image-level labels. However, these techniques are limited to cover only the most discriminative part of the object and not the entire object. To address this problem, we propose an attention-based dropout layer, which utilizes the attention mechanism to locate the entire object efficiently. To achieve this, we devise two key components, 1) hiding the most discriminative part from the model to capture the entire object, and 2) highlighting the informative region to improve the classification power of the model. These allow the classifier to be maintained with a reasonable accuracy while the entire object is covered. Through extensive experiments, we demonstrate that the proposed method effectively improves the weakly supervised single object localization accuracy, thereby achieving a new state-of-the-art localization accuracy on the CUB-200-2011 and a comparable accuracy existing state-of-the-arts on the ImageNet-1k. The proposed method is also effective in improving the weakly supervised semantic segmentation performance on the Pascal VOC and MS COCO. Furthermore, the proposed method is more efficient than existing techniques in terms of parameter and computation overheads. Additionally, the proposed method can be easily applied in various backbone networks.
弱监督单目标定位和语义分割技术都仅使用图像级标签来学习目标的位置。然而,这些技术仅限于覆盖目标最具判别力的部分,而非整个目标。为解决此问题,我们提出了一种基于注意力的随机失活层,其利用注意力机制来高效定位整个目标。为此,我们设计了两个关键组件:1)向模型隐藏最具判别力的部分以捕获整个目标,以及2)突出显示信息区域以提高模型的分类能力。这使得在覆盖整个目标的同时,分类器能够以合理的准确率得以维持。通过大量实验,我们证明所提出的方法有效提高了弱监督单目标定位的准确率,从而在CUB - 200 - 2011数据集上实现了新的最优定位准确率,在ImageNet - 1k数据集上达到了与现有最优方法相当的准确率。所提出的方法在提高Pascal VOC和MS COCO数据集上的弱监督语义分割性能方面也很有效。此外,所提出的方法在参数和计算开销方面比现有技术更高效。另外,所提出的方法可以很容易地应用于各种骨干网络。