Suppr超能文献

用于场景识别的局部监督深度混合模型

Locally Supervised Deep Hybrid Model for Scene Recognition.

出版信息

IEEE Trans Image Process. 2017 Feb;26(2):808-820. doi: 10.1109/TIP.2016.2629443. Epub 2016 Nov 16.

Abstract

Convolutional neural networks (CNNs) have recently achieved remarkable successes in various image classification and understanding tasks. The deep features obtained at the top fully connected layer of the CNN (FC-features) exhibit rich global semantic information and are extremely effective in image classification. On the other hand, the convolutional features in the middle layers of the CNN also contain meaningful local information, but are not fully explored for image representation. In this paper, we propose a novel locally supervised deep hybrid model (LS-DHM) that effectively enhances and explores the convolutional features for scene recognition. First, we notice that the convolutional features capture local objects and fine structures of scene images, which yield important cues for discriminating ambiguous scenes, whereas these features are significantly eliminated in the highly compressed FC representation. Second, we propose a new local convolutional supervision layer to enhance the local structure of the image by directly propagating the label information to the convolutional layers. Third, we propose an efficient Fisher convolutional vector (FCV) that successfully rescues the orderless mid-level semantic information (e.g., objects and textures) of scene image. The FCV encodes the large-sized convolutional maps into a fixed-length mid-level representation, and is demonstrated to be strongly complementary to the high-level FC-features. Finally, both the FCV and FC-features are collaboratively employed in the LS-DHM representation, which achieves outstanding performance in our experiments. It obtains 83.75% and 67.56% accuracies, respectively, on the heavily benchmarked MIT Indoor67 and SUN397 data sets, advancing the state-of-the-art substantially.

摘要

卷积神经网络(CNN)最近在各种图像分类和理解任务中取得了显著成功。在CNN顶部全连接层获得的深度特征(FC特征)展现出丰富的全局语义信息,并且在图像分类中极其有效。另一方面,CNN中间层的卷积特征也包含有意义的局部信息,但尚未被充分挖掘用于图像表示。在本文中,我们提出了一种新颖的局部监督深度混合模型(LS-DHM),它能有效增强并挖掘卷积特征用于场景识别。首先,我们注意到卷积特征捕捉场景图像的局部对象和精细结构,这些为区分模糊场景提供了重要线索,然而这些特征在高度压缩的FC表示中被显著消除。其次,我们提出了一种新的局部卷积监督层,通过将标签信息直接传播到卷积层来增强图像的局部结构。第三,我们提出了一种高效的Fisher卷积向量(FCV),它成功挽救了场景图像的无序中级语义信息(例如对象和纹理)。FCV将大尺寸的卷积映射编码为固定长度的中级表示,并被证明与高级FC特征具有很强的互补性。最后,FCV和FC特征在LS-DHM表示中协同使用,在我们的实验中取得了出色的性能。在备受关注的MIT Indoor67和SUN397数据集上,它分别获得了83.75%和67.56%的准确率,大幅提升了当前的技术水平。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验