IEEE Trans Image Process. 2016 Dec;25(12):5678-5688. doi: 10.1109/TIP.2016.2612829. Epub 2016 Sep 22.
Learning high-level image representations using object proposals has achieved remarkable success in multi-label image recognition. However, most object proposals provide merely coarse information about the objects, and only carefully selected proposals can be helpful for boosting the performance of multi-label image recognition. In this paper, we propose an object-proposal-free framework for multi-label image recognition: random crop pooling (RCP). Basically, RCP performs stochastic scaling and cropping over images before feeding them to a standard convolutional neural network, which works quite well with a max-pooling operation for recognizing the complex contents of multi-label images. To better fit the multi-label image recognition task, we further develop a new loss function-the dynamic weighted Euclidean loss-for the training of the deep network. Our RCP approach is amazingly simple yet effective. It can achieve significantly better image recognition performance than the approaches using object proposals. Moreover, our adapted network can be easily trained in an end-to-end manner. Extensive experiments are conducted on two representative multi-label image recognition data sets (i.e., PASCAL VOC 2007 and PASCAL VOC 2012), and the results clearly demonstrate the superiority of our approach.
利用目标提议来学习高级图像表示在多标签图像识别中取得了显著成功。然而,大多数目标提议仅提供关于对象的粗略信息,只有经过精心挑选的提议才有助于提高多标签图像识别的性能。在本文中,我们提出了一种用于多标签图像识别的无目标提议框架:随机裁剪池化(RCP)。基本上,RCP在将图像输入标准卷积神经网络之前,对图像进行随机缩放和裁剪,这与用于识别多标签图像复杂内容的最大池化操作配合得很好。为了更好地适应多标签图像识别任务,我们进一步开发了一种新的损失函数——动态加权欧几里得损失——用于深度网络的训练。我们的RCP方法惊人地简单却有效。它能够比使用目标提议的方法取得显著更好的图像识别性能。此外,我们经过调整的网络可以很容易地以端到端的方式进行训练。我们在两个具有代表性的多标签图像识别数据集(即PASCAL VOC 2007和PASCAL VOC 2012)上进行了广泛的实验,结果清楚地证明了我们方法的优越性。