IEEE Trans Image Process. 2016 Apr;25(4):1713-25. doi: 10.1109/TIP.2016.2531289. Epub 2016 Feb 18.
In this paper, we propose a fine-grained image categorization system with easy deployment. We do not use any object/part annotation (weakly supervised) in the training or in the testing stage, but only class labels for training images. Fine-grained image categorization aims to classify objects with only subtle distinctions (e.g., two breeds of dogs that look alike). Most existing works heavily rely on object/part detectors to build the correspondence between object parts, which require accurate object or object part annotations at least for training images. The need for expensive object annotations prevents the wide usage of these methods. Instead, we propose to generate multi-scale part proposals from object proposals, select useful part proposals, and use them to compute a global image representation for categorization. This is specially designed for the weakly supervised fine-grained categorization task, because useful parts have been shown to play a critical role in existing annotation-dependent works, but accurate part detectors are hard to acquire. With the proposed image representation, we can further detect and visualize the key (most discriminative) parts in objects of different classes. In the experiments, the proposed weakly supervised method achieves comparable or better accuracy than the state-of-the-art weakly supervised methods and most existing annotation-dependent methods on three challenging datasets. Its success suggests that it is not always necessary to learn expensive object/part detectors in fine-grained image categorization.
在本文中,我们提出了一种具有易于部署的细粒度图像分类系统。我们在训练或测试阶段都不使用任何对象/部分注释(弱监督),而仅使用训练图像的类别标签。细粒度图像分类旨在对具有细微差别的对象进行分类(例如,看起来相似的两种犬种)。大多数现有作品严重依赖对象/部分检测器来建立对象部分之间的对应关系,这至少需要对训练图像进行准确的对象或对象部分注释。对昂贵的对象注释的需求阻止了这些方法的广泛使用。相反,我们建议从对象建议中生成多尺度部分建议,选择有用的部分建议,并使用它们来计算用于分类的全局图像表示。这是专门为弱监督细粒度分类任务设计的,因为已经证明有用的部分在现有的依赖注释的作品中起着关键作用,但是很难获取准确的部分检测器。使用所提出的图像表示,我们可以进一步检测和可视化不同类别对象中的关键(最具判别力)部分。在实验中,所提出的弱监督方法在三个具有挑战性的数据集上的准确性与最先进的弱监督方法和大多数现有的依赖注释的方法相当或更好。它的成功表明,在细粒度图像分类中,学习昂贵的对象/部分检测器并不总是必要的。