Suppr超能文献

基于多分辨率卷积神经网络的大规模场景分类的知识引导消歧

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs.

出版信息

IEEE Trans Image Process. 2017 Apr;26(4):2055-2068. doi: 10.1109/TIP.2017.2675339. Epub 2017 Feb 24.

Abstract

Convolutional neural networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2. Scene categories are often defined by multi-level information, including local objects, global layout, and background environment, thus leading to large intra-class variations. In addition, with the increasing number of scene categories, label ambiguity has become another crucial issue in large-scale classification. This paper focuses on large-scale scene recognition and makes two major contributions to tackle these issues. First, we propose a multi-resolution CNN architecture that captures visual content and structure at multiple levels. The multi-resolution CNNs are composed of coarse resolution CNNs and fine resolution CNNs, which are complementary to each other. Second, we design two knowledge guided disambiguation techniques to deal with the problem of label ambiguity: 1) we exploit the knowledge from the confusion matrix computed on validation data to merge ambiguous classes into a super category and 2) we utilize the knowledge of extra networks to produce a soft label for each image. Then, the super categories or soft labels are employed to guide CNN training on the Places2. We conduct extensive experiments on three large-scale image datasets (ImageNet, Places, and Places2), demonstrating the effectiveness of our approach. Furthermore, our method takes part in two major scene recognition challenges, and achieves the second place at the Places2 challenge in ILSVRC 2015, and the first place at the LSUN challenge in CVPR 2016. Finally, we directly test the learned representations on other scene benchmarks, and obtain the new state-of-the-art results on the MIT Indoor67 (86.7%) and SUN397 (72.0%). We release the code and models at https://github.com/wanglimin/MRCNN-Scene-Recognition.

摘要

卷积神经网络(CNN)在场景识别方面取得了显著进展,部分归功于最近的这些大规模场景数据集,如Places和Places2。场景类别通常由多级信息定义,包括局部物体、全局布局和背景环境,从而导致类内差异较大。此外,随着场景类别的数量不断增加,标签模糊性已成为大规模分类中的另一个关键问题。本文聚焦于大规模场景识别,并为解决这些问题做出了两大贡献。首先,我们提出了一种多分辨率CNN架构,该架构能在多个级别捕捉视觉内容和结构。多分辨率CNN由粗分辨率CNN和细分辨率CNN组成,它们相互补充。其次,我们设计了两种知识引导的消歧技术来处理标签模糊性问题:1)我们利用在验证数据上计算的混淆矩阵中的知识,将模糊类合并为一个超级类别;2)我们利用额外网络的知识为每个图像生成一个软标签。然后,使用超级类别或软标签来指导Places2上的CNN训练。我们在三个大规模图像数据集(ImageNet、Places和Places2)上进行了广泛实验,证明了我们方法的有效性。此外,我们的方法参加了两项主要的场景识别挑战赛,在2015年ILSVRC的Places2挑战赛中获得第二名,在2016年CVPR的LSUN挑战赛中获得第一名。最后,我们直接在其他场景基准上测试学习到的表示,并在MIT Indoor67(86.7%)和SUN397(72.0%)上获得了新的最优结果。我们在https://github.com/wanglimin/MRCNN-Scene-Recognition上发布了代码和模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验