Jiang Peng-Tao, Zhang Chang-Bin, Hou Qibin, Cheng Ming-Ming, Wei Yunchao
IEEE Trans Image Process. 2021;30:5875-5888. doi: 10.1109/TIP.2021.3089943. Epub 2021 Jun 28.
The class activation maps are generated from the final convolutional layer of CNN. They can highlight discriminative object regions for the class of interest. These discovered object regions have been widely used for weakly-supervised tasks. However, due to the small spatial resolution of the final convolutional layer, such class activation maps often locate coarse regions of the target objects, limiting the performance of weakly-supervised tasks that need pixel-accurate object locations. Thus, we aim to generate more fine-grained object localization information from the class activation maps to locate the target objects more accurately. In this paper, by rethinking the relationships between the feature maps and their corresponding gradients, we propose a simple yet effective method, called LayerCAM. It can produce reliable class activation maps for different layers of CNN. This property enables us to collect object localization information from coarse (rough spatial localization) to fine (precise fine-grained details) levels. We further integrate them into a high-quality class activation map, where the object-related pixels can be better highlighted. To evaluate the quality of the class activation maps produced by LayerCAM, we apply them to weakly-supervised object localization and semantic segmentation. Experiments demonstrate that the class activation maps generated by our method are more effective and reliable than those by the existing attention methods. The code will be made publicly available.
类激活映射是从卷积神经网络(CNN)的最后一个卷积层生成的。它们可以突出显示感兴趣类别的判别性目标区域。这些发现的目标区域已被广泛用于弱监督任务。然而,由于最后一个卷积层的空间分辨率较小,此类类激活映射通常定位在目标对象的粗略区域,限制了需要像素级精确目标位置的弱监督任务的性能。因此,我们旨在从类激活映射中生成更细粒度的目标定位信息,以便更准确地定位目标对象。在本文中,通过重新思考特征映射与其相应梯度之间的关系,我们提出了一种简单而有效的方法,称为LayerCAM。它可以为CNN的不同层生成可靠的类激活映射。这一特性使我们能够从粗粒度(粗略的空间定位)到细粒度(精确的细粒度细节)级别收集目标定位信息。我们进一步将它们集成到一个高质量的类激活映射中,其中与对象相关的像素可以得到更好的突出显示。为了评估LayerCAM生成的类激活映射的质量,我们将其应用于弱监督目标定位和语义分割。实验表明,我们的方法生成的类激活映射比现有注意力方法生成的类激活映射更有效、更可靠。代码将公开提供。