IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):5632-5648. doi: 10.1109/TPAMI.2022.3217373. Epub 2023 Apr 3.
In this work, we develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to calculate the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively alleviate the adverse impact caused by the cluttered background and large intra-class appearance variations. To implement k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. Our extensive experiments validate the effectiveness of our algorithm which outperforms state-of-the-art methods by a significant margin on five widely used few-shot classification benchmarks, namely, miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100), Caltech-UCSD Birds-200-2011 (CUB), and CIFAR-FewShot (CIFAR-FS). We also demonstrate the effectiveness of our method on the image retrieval task in our experiments.
在这项工作中,我们从图像区域最佳匹配的新视角开发了用于小样本图像分类的方法。我们采用基于距离的最优匹配(EMD)作为度量来计算密集图像表示之间的结构距离以确定图像相关性。EMD 生成结构元素之间具有最小匹配成本的最优匹配流,用于计算分类的图像距离。为了在 EMD 公式中生成元素的重要权重,我们设计了一种交叉引用机制,该机制可以有效地减轻杂乱背景和大的类内外观变化造成的不利影响。为了实现 k -shot 分类,我们提出学习一个结构化的全连接层,可以直接用 EMD 对密集图像表示进行分类。基于隐函数定理,可以将 EMD 作为一个层插入网络中进行端到端训练。我们的广泛实验验证了我们算法的有效性,在五个广泛使用的小样本分类基准上,包括 miniImageNet、tieredImageNet、Fewshot-CIFAR100(FC100)、加州理工学院鸟数据集(CUB)和 CIFAR-FewShot(CIFAR-FS),我们的算法明显优于最先进的方法。我们还在实验中展示了我们的方法在图像检索任务中的有效性。