Suppr超能文献

SIM-OFE:用于细粒度视觉分类的结构信息挖掘与目标感知特征增强

SIM-OFE: Structure Information Mining and Object-Aware Feature Enhancement for Fine-Grained Visual Categorization.

作者信息

Sun Hongbo, He Xiangteng, Xu Jinglin, Peng Yuxin

出版信息

IEEE Trans Image Process. 2024;33:5312-5326. doi: 10.1109/TIP.2024.3459788. Epub 2024 Sep 27.

Abstract

Fine-grained visual categorization (FGVC) aims to distinguish visual objects from multiple subcategories of the coarse-grained category. Subtle inter-class differences among various subcategories make the FGVC task more challenging. Existing methods primarily focus on learning salient visual patterns while ignoring how to capture the object's internal structure, causing difficulty in obtaining complete discriminative regions within the object to limit FGVC performance. To address the above issue, we propose a Structure Information Mining and Object-aware Feature Enhancement (SIM-OFE) method for fine-grained visual categorization, which explores the visual object's internal structure composition and appearance traits. Concretely, we first propose a simple yet effective hybrid perception attention module for locating visual objects based on global-scope and local-scope significance analyses. Then, a structure information mining module is proposed to model the distribution and context relation of critical regions within the object, highlighting the whole object and discriminative regions for distinguishing subtle differences. Finally, an object-aware feature enhancement module is proposed to combine global-scope and local-scope discriminative features in an attentive coupling way for powerful visual representations in fine-grained recognition. Extensive experiments on three FGVC benchmark datasets demonstrate that our proposed SIM-OFE method can achieve state-of-the-art performance.

摘要

细粒度视觉分类(FGVC)旨在区分粗粒度类别中多个子类别中的视觉对象。各个子类别之间细微的类间差异使得FGVC任务更具挑战性。现有方法主要侧重于学习显著的视觉模式,而忽略了如何捕捉对象的内部结构,导致难以在对象内获得完整的判别区域,从而限制了FGVC的性能。为了解决上述问题,我们提出了一种用于细粒度视觉分类的结构信息挖掘与对象感知特征增强(SIM-OFE)方法,该方法探索视觉对象的内部结构组成和外观特征。具体而言,我们首先基于全局范围和局部范围的重要性分析,提出了一个简单而有效的混合感知注意力模块来定位视觉对象。然后,提出了一个结构信息挖掘模块来对对象内关键区域的分布和上下文关系进行建模,突出整个对象和用于区分细微差异的判别区域。最后,提出了一个对象感知特征增强模块,以注意力耦合的方式组合全局范围和局部范围的判别特征,以在细粒度识别中获得强大的视觉表示。在三个FGVC基准数据集上进行的大量实验表明,我们提出的SIM-OFE方法可以实现领先的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验