学习发现多标签图像识别的多类注意力区域。

Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition.

出版信息

IEEE Trans Image Process. 2021;30:5920-5932. doi: 10.1109/TIP.2021.3088605. Epub 2021 Jun 29.

DOI:10.1109/TIP.2021.3088605

Abstract

Multi-label image recognition is a practical and challenging task compared to single-label image classification. However, previous works may be suboptimal because of a great number of object proposals or complex attentional region generation modules. In this paper, we propose a simple but efficient two-stream framework to recognize multi-category objects from global image to local regions, similar to how human beings perceive objects. To bridge the gap between global and local streams, we propose a multi-class attentional region module which aims to make the number of attentional regions as small as possible and keep the diversity of these regions as high as possible. Our method can efficiently and effectively recognize multi-class objects with an affordable computation cost and a parameter-free region localization module. Over three benchmarks on multi-label image classification, our method achieves new state-of-the-art results with a single model only using image semantics without label dependency. In addition, the effectiveness of the proposed method is extensively demonstrated under different factors such as global pooling strategy, input size and network architecture. Code has been made available at https://github.com/gaobb/MCAR.

摘要

多标签图像识别与单标签图像分类相比是一项实际而具有挑战性的任务。然而，由于大量的对象提议或复杂的注意力区域生成模块，之前的工作可能不是最优的。在本文中，我们提出了一个简单而有效的双流框架，从全局图像到局部区域来识别多类别对象，类似于人类感知对象的方式。为了弥合全局和局部流之间的差距，我们提出了一种多类别注意力区域模块，旨在使注意力区域的数量尽可能少，并保持这些区域的多样性尽可能高。我们的方法可以以可承受的计算成本和无参数的区域定位模块有效地、高效地识别多类别对象。在三个多标签图像分类基准上，我们的方法仅使用图像语义而不依赖标签，实现了新的最先进的结果，而且仅使用单个模型。此外，在全局池化策略、输入大小和网络架构等不同因素下，广泛证明了所提出方法的有效性。代码可在 https://github.com/gaobb/MCAR 上获得。

相似文献

Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition.学习发现多标签图像识别的多类注意力区域。

IEEE Trans Image Process. 2021;30:5920-5932. doi: 10.1109/TIP.2021.3088605. Epub 2021 Jun 29.

Beyond Object Proposals: Random Crop Pooling for Multi-Label Image Recognition.超越目标提议：用于多标签图像识别的随机裁剪池化

IEEE Trans Image Process. 2016 Dec;25(12):5678-5688. doi: 10.1109/TIP.2016.2612829. Epub 2016 Sep 22.

Spatial Context-Aware Object-Attentional Network for Multi-Label Image Classification.空间上下文感知目标注意网络的多标签图像分类。

IEEE Trans Image Process. 2023;32:3000-3012. doi: 10.1109/TIP.2023.3266161. Epub 2023 May 26.

Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification.在用于多标签航空图像分类的混合卷积和双向长短期记忆网络中反复探索类别注意力。

ISPRS J Photogramm Remote Sens. 2019 Mar;149:188-199. doi: 10.1016/j.isprsjprs.2019.01.015.

Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition.基于知识引导的通用图像识别的多标签少样本学习。

IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1371-1384. doi: 10.1109/TPAMI.2020.3025814. Epub 2022 Feb 3.

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs.基于多分辨率卷积神经网络的大规模场景分类的知识引导消歧

IEEE Trans Image Process. 2017 Apr;26(4):2055-2068. doi: 10.1109/TIP.2017.2675339. Epub 2017 Feb 24.

Class Agnostic Image Common Object Detection.类别无关图像通用目标检测

IEEE Trans Image Process. 2019 Jan 9. doi: 10.1109/TIP.2019.2891124.

Graph embedding based multi-label Zero-shot Learning.基于图嵌入的多标签零样本学习。

Neural Netw. 2023 Oct;167:129-140. doi: 10.1016/j.neunet.2023.08.023. Epub 2023 Aug 19.

HTD: Heterogeneous Task Decoupling for Two-Stage Object Detection.HTD：用于两阶段目标检测的异构任务解耦

IEEE Trans Image Process. 2021;30:9456-9469. doi: 10.1109/TIP.2021.3126423. Epub 2021 Nov 18.

S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification.S-MAT：用于多标签航空图像分类的语义驱动掩蔽注意力转换器。

Sensors (Basel). 2022 Jul 20;22(14):5433. doi: 10.3390/s22145433.

引用本文的文献

Fairer AI in ophthalmology via implicit fairness learning for mitigating sexism and ageism.通过隐性公平学习减少眼科人工智能中的性别歧视和年龄歧视，实现更公平的人工智能。

Nat Commun. 2024 Jun 4;15(1):4750. doi: 10.1038/s41467-024-48972-0.

[Understanding and Application of Multi-Task Learning in Medical Artificial Intelligence].[多任务学习在医学人工智能中的理解与应用]

J Korean Soc Radiol. 2022 Nov;83(6):1208-1218. doi: 10.3348/jksr.2022.0155. Epub 2022 Nov 30.

S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification.S-MAT：用于多标签航空图像分类的语义驱动掩蔽注意力转换器。

Sensors (Basel). 2022 Jul 20;22(14):5433. doi: 10.3390/s22145433.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

学习发现多标签图像识别的多类注意力区域。

Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献