IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9521-9535. doi: 10.1109/TPAMI.2021.3126668. Epub 2022 Nov 7.
Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works are mainly part-driven (either explicitly or implicitly), with the assumption that fine-grained information naturally rests within the parts. In this paper, we take a different stance, and show that part operations are not strictly necessary - the key lies with encouraging the network to learn at different granularities and progressively fusing multi-granularity features together. In particular, we propose: (i) a progressive training strategy that effectively fuses features from different granularities, and (ii) a consistent block convolution that encourages the network to learn the category-consistent features at specific granularities. We evaluate on several standard FGVC benchmark datasets, and demonstrate the proposed method consistently outperforms existing alternatives or delivers competitive results. Codes are available at https://github.com/PRIS-CV/PMG-V2.
细粒度视觉分类(FGVC)比传统分类任务更具挑战性,因为其内在的类内对象变化非常细微。最近的工作主要是基于部件的(无论是显式的还是隐式的),其假设是细粒度信息自然存在于部件内。在本文中,我们采取了不同的立场,并表明部件操作不是严格必需的 - 关键在于鼓励网络以不同的粒度进行学习,并逐步融合多粒度特征。具体来说,我们提出了:(i)一种有效的融合不同粒度特征的渐进式训练策略,以及(ii)一种一致的块卷积,鼓励网络在特定粒度上学习类别一致的特征。我们在几个标准的 FGVC 基准数据集上进行了评估,结果表明,所提出的方法始终优于现有方法,或者能够提供有竞争力的结果。代码可在 https://github.com/PRIS-CV/PMG-V2 上获得。