IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):579-590. doi: 10.1109/TPAMI.2019.2933510. Epub 2022 Jan 7.
This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.
本文提出了一个端到端的细粒度视觉分类系统,称为基于部分的卷积神经网络(P-CNN),它由三个模块组成。第一个模块是挤压激励(SE)块,通过强调信息通道和抑制不那么有用的通道,学习重新校准通道特征响应。第二个模块是部分定位网络(PLN),用于定位有区别的对象部分,通过该网络学习一组卷积滤波器作为有区分性的部分检测器。因此,可以通过用每个部分检测器卷积特征图来发现一组信息丰富的部分。第三个模块是部分分类网络(PCN),它有两个流。第一个流将每个单独的对象部分分类为图像级类别。第二个流将部分特征和全局特征连接成一个联合特征,用于最终分类。为了学习强大的部分特征并增强联合特征能力,我们提出了一种用于度量学习和部分分类的双焦点损失,该损失专注于训练困难的例子。我们进一步通过一种简单的训练技术将 PLN 和 PCN 合并到一个统一的网络中进行端到端训练。在三个基准数据集上的综合实验和与最先进方法的比较表明了我们提出的方法的有效性。