HCP：一种用于多标签图像分类的灵活卷积神经网络框架。

HCP: A Flexible CNN Framework for Multi-label Image Classification.

作者信息

Wei Yunchao, Xia Wei, Lin Min, Huang Junshi, Ni Bingbing, Dong Jian, Zhao Yao, Yan Shuicheng

出版信息

IEEE Trans Pattern Anal Mach Intell. 2016 Sep 1;38(9):1901-1907. doi: 10.1109/TPAMI.2015.2491929. Epub 2015 Oct 26.

DOI:10.1109/TPAMI.2015.2491929

Abstract

Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in [44] based on hand-crafted features on the VOC 2012 dataset.

摘要

卷积神经网络（CNN）在单标签图像分类任务中已展现出良好的性能。然而，CNN如何最佳地处理多标签图像仍是一个悬而未决的问题，主要原因在于复杂的底层物体布局以及多标签训练图像不足。在这项工作中，我们提出了一种灵活的深度CNN架构，称为假设-CNN-池化（HCP），其中任意数量的物体分割假设被用作输入，然后一个共享的CNN与每个假设相连，最后不同假设的CNN输出结果通过最大池化进行聚合，以产生最终的多标签预测。这种灵活的深度CNN架构的一些独特特性包括：1）训练时无需真实边界框信息；2）整个HCP架构对可能有噪声和/或冗余的假设具有鲁棒性；3）共享的CNN灵活且可以使用大规模单标签图像数据集（例如ImageNet）进行良好的预训练；4）它可以自然地输出多标签预测结果。在Pascal VOC 2007和VOC 2012多标签图像数据集上的实验结果很好地证明了所提出的HCP架构优于其他现有技术。特别是，仅通过HCP，在VOC 2012数据集上的平均精度均值（mAP）达到90.5%，与我们基于手工特征在[44]中的互补结果融合后达到93.2%。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

HCP：一种用于多标签图像分类的灵活卷积神经网络框架。

HCP: A Flexible CNN Framework for Multi-label Image Classification.

作者信息

出版信息

相似文献

引用本文的文献

HCP：一种用于多标签图像分类的灵活卷积神经网络框架。

HCP: A Flexible CNN Framework for Multi-label Image Classification.

作者信息

出版信息

相似文献

引用本文的文献