National Engineering Lab for Big Data Analytics and School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
School of Computer Science and Technology and Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Tech. R&D, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
Neural Comput. 2020 Sep;32(9):1664-1684. doi: 10.1162/neco_a_01302. Epub 2020 Jul 20.
In the real world, a limited number of labeled finely grained images per class can hardly represent the class distribution effectively. Due to the more subtle visual differences in fine-grained images than simple images with obvious objects, that is, there exist smaller interclass and larger intraclass variations. To solve these issues, we propose an end-to-end attention-based model for fine-grained few-shot image classification (AFG) with the recent episode training strategy. It is composed mainly of a feature learning module, an image reconstruction module, and a label distribution module. The feature learning module mainly devises a 3D-Attention mechanism, which considers both the spatial positions and different channel attentions of the image features, in order to learn more discriminative local features to better represent the class distribution. The image reconstruction module calculates the mappings between local features and the original images. It is constrained by a designed loss function as auxiliary supervised information, so that the learning of each local feature does not need extra annotations. The label distribution module is used to predict the label distribution of a given unlabeled sample, and we use the local features to represent the image features for classification. By conducting comprehensive experiments on Mini-ImageNet and three fine-grained data sets, we demonstrate that the proposed model achieves superior performance over the competitors.
在现实世界中,每个类别可用的标记的细粒度图像数量有限,很难有效地表示类别分布。由于细粒度图像的视觉差异比具有明显对象的简单图像更细微,即存在更小的类间变化和更大的类内变化。为了解决这些问题,我们提出了一种基于端到端注意力的细粒度小样本图像分类方法(AFG),并采用了最近的情节训练策略。它主要由特征学习模块、图像重建模块和标签分布模块组成。特征学习模块主要设计了一种 3D 注意力机制,该机制考虑了图像特征的空间位置和不同通道的注意力,以便学习更具判别力的局部特征,从而更好地表示类别分布。图像重建模块计算局部特征与原始图像之间的映射。它受到设计的损失函数的约束,作为辅助监督信息,因此不需要额外的注释来学习每个局部特征。标签分布模块用于预测给定未标记样本的标签分布,我们使用局部特征来表示图像特征进行分类。通过在 Mini-ImageNet 和三个细粒度数据集上进行全面的实验,我们证明了所提出的模型在竞争对手中表现出卓越的性能。