Wang Jiahui, Xu Qin, Jiang Bo, Luo Bin, Tang Jinhui
IEEE Trans Image Process. 2024;33:4529-4542. doi: 10.1109/TIP.2024.3441813. Epub 2024 Aug 23.
Fine-grained visual classification aims to classify similar sub-categories with the challenges of large variations within the same sub-category and high visual similarities between different sub-categories. Recently, methods that extract semantic parts of the discriminative regions have attracted increasing attention. However, most existing methods extract the part features via rectangular bounding boxes by object detection module or attention mechanism, which makes it difficult to capture the rich shape information of objects. In this paper, we propose a novel Multi-Granularity Part Sampling Attention (MPSA) network for fine-grained visual classification. First, a novel multi-granularity part retrospect block is designed to extract the part information of different scales and enhance the high-level feature representation with discriminative part features of different granularities. Then, to extract part features of various shapes at each granularity, we propose part sampling attention, which can sample the implicit semantic parts on the feature maps comprehensively. The proposed part sampling attention not only considers the importance of sampled parts but also adopts the part dropout to reduce the overfitting issue. In addition, we propose a novel multi-granularity fusion method to highlight the foreground features and suppress the background noises with the assistance of the gradient class activation map. Experimental results demonstrate that the proposed MPSA achieves state-of-the-art performance on four commonly used fine-grained visual classification benchmarks. The source code is publicly available at https://github.com/mobulan/MPSA.
细粒度视觉分类旨在对相似的子类别进行分类,面临着同一子类别内变化大以及不同子类别间视觉相似度高的挑战。最近,提取判别区域语义部分的方法受到了越来越多的关注。然而,大多数现有方法通过目标检测模块或注意力机制利用矩形边界框提取部分特征,这使得难以捕捉物体丰富的形状信息。在本文中,我们提出了一种用于细粒度视觉分类的新型多粒度部分采样注意力(MPSA)网络。首先,设计了一种新型的多粒度部分回溯模块,以提取不同尺度的部分信息,并利用不同粒度的判别部分特征增强高级特征表示。然后,为了在每个粒度上提取各种形状的部分特征,我们提出了部分采样注意力,它可以全面地在特征图上采样隐式语义部分。所提出的部分采样注意力不仅考虑了采样部分的重要性,还采用了部分随机失活来减少过拟合问题。此外,我们提出了一种新型的多粒度融合方法,借助梯度类激活图突出前景特征并抑制背景噪声。实验结果表明,所提出的MPSA在四个常用的细粒度视觉分类基准上取得了领先的性能。源代码可在https://github.com/mobulan/MPSA上公开获取。