Suppr超能文献

SR-GNN:用于细粒度图像分类的空间关系感知图神经网络

SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization.

作者信息

Bera Asish, Wharton Zachary, Liu Yonghuai, Bessis Nik, Behera Ardhendu

出版信息

IEEE Trans Image Process. 2022 Sep 14;PP. doi: 10.1109/TIP.2022.3205215.

Abstract

Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) since it exhibits high intra-class and low inter-class variances due to occlusions, deformation, illuminations, etc. Thus, an expressive feature representation describing global structural information is a key to characterize an object/ scene. To this end, we propose a method that effectively captures subtle changes by aggregating context-aware features from most relevant image-regions and their importance in discriminating fine-grained categories avoiding the bounding-box and/or distinguishable part annotations. Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs) approaches to include a simple yet effective relation-aware feature transformation and its refinement using a context-aware attention mechanism to boost the discriminability of the transformed feature in an end-to-end learning process. Our model is evaluated on eight benchmark datasets consisting of fine-grained objects and human-object interactions. It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.

摘要

在过去几年中,基于深度卷积神经网络(CNN)的图像识别取得了重大进展。这主要归功于此类网络在从纹理和形状中挖掘有判别力的物体姿态和部件信息方面的强大能力。然而,这对于细粒度视觉分类(FGVC)来说往往并不适用,因为由于遮挡、变形、光照等因素,FGVC呈现出高类内方差和低类间方差。因此,一种能够描述全局结构信息的富有表现力的特征表示是表征物体/场景的关键。为此,我们提出了一种方法,该方法通过聚合来自最相关图像区域的上下文感知特征及其在区分细粒度类别中的重要性,有效地捕捉细微变化,避免使用边界框和/或可区分部件注释。我们的方法受到自注意力和图神经网络(GNN)方法的最新进展的启发,包括一个简单而有效的关系感知特征变换,并使用上下文感知注意力机制对其进行细化,以在端到端学习过程中提高变换后特征的可辨别性。我们的模型在由细粒度物体和人-物交互组成的八个基准数据集上进行了评估。在识别准确率方面,它显著优于当前的先进方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验