IEEE Trans Image Process. 2021;30:5920-5932. doi: 10.1109/TIP.2021.3088605. Epub 2021 Jun 29.
Multi-label image recognition is a practical and challenging task compared to single-label image classification. However, previous works may be suboptimal because of a great number of object proposals or complex attentional region generation modules. In this paper, we propose a simple but efficient two-stream framework to recognize multi-category objects from global image to local regions, similar to how human beings perceive objects. To bridge the gap between global and local streams, we propose a multi-class attentional region module which aims to make the number of attentional regions as small as possible and keep the diversity of these regions as high as possible. Our method can efficiently and effectively recognize multi-class objects with an affordable computation cost and a parameter-free region localization module. Over three benchmarks on multi-label image classification, our method achieves new state-of-the-art results with a single model only using image semantics without label dependency. In addition, the effectiveness of the proposed method is extensively demonstrated under different factors such as global pooling strategy, input size and network architecture. Code has been made available at https://github.com/gaobb/MCAR.
多标签图像识别与单标签图像分类相比是一项实际而具有挑战性的任务。然而,由于大量的对象提议或复杂的注意力区域生成模块,之前的工作可能不是最优的。在本文中,我们提出了一个简单而有效的双流框架,从全局图像到局部区域来识别多类别对象,类似于人类感知对象的方式。为了弥合全局和局部流之间的差距,我们提出了一种多类别注意力区域模块,旨在使注意力区域的数量尽可能少,并保持这些区域的多样性尽可能高。我们的方法可以以可承受的计算成本和无参数的区域定位模块有效地、高效地识别多类别对象。在三个多标签图像分类基准上,我们的方法仅使用图像语义而不依赖标签,实现了新的最先进的结果,而且仅使用单个模型。此外,在全局池化策略、输入大小和网络架构等不同因素下,广泛证明了所提出方法的有效性。代码可在 https://github.com/gaobb/MCAR 上获得。