Suppr超能文献

多尺度多特征上下文建模在语义流形中的场景识别。

Multi-Scale Multi-Feature Context Modeling for Scene Recognition in the Semantic Manifold.

出版信息

IEEE Trans Image Process. 2017 Jun;26(6):2721-2735. doi: 10.1109/TIP.2017.2686017. Epub 2017 Mar 22.

Abstract

Before the big data era, scene recognition was often approached with two-step inference using localized intermediate representations (objects, topics, and so on). One of such approaches is the semantic manifold (SM), in which patches and images are modeled as points in a semantic probability simplex. Patch models are learned resorting to weak supervision via image labels, which leads to the problem of scene categories co-occurring in this semantic space. Fortunately, each category has its own co-occurrence patterns that are consistent across the images in that category. Thus, discovering and modeling these patterns are critical to improve the recognition performance in this representation. Since the emergence of large data sets, such as ImageNet and Places, these approaches have been relegated in favor of the much more powerful convolutional neural networks (CNNs), which can automatically learn multi-layered representations from the data. In this paper, we address many limitations of the original SM approach and related works. We propose discriminative patch representations using neural networks and further propose a hybrid architecture in which the semantic manifold is built on top of multiscale CNNs. Both representations can be computed significantly faster than the Gaussian mixture models of the original SM. To combine multiple scales, spatial relations, and multiple features, we formulate rich context models using Markov random fields. To solve the optimization problem, we analyze global and local approaches, where a top-down hierarchical algorithm has the best performance. Experimental results show that exploiting different types of contextual relations jointly consistently improves the recognition accuracy.

摘要

在大数据时代之前,场景识别通常采用两步推理的方法,使用局部中间表示(对象、主题等)。其中一种方法是语义流形(SM),其中补丁和图像被建模为语义概率单形中的点。通过图像标签进行弱监督学习来学习补丁模型,这导致了场景类别在这个语义空间中共同出现的问题。幸运的是,每个类别都有自己的共同出现模式,这些模式在该类别中的图像中是一致的。因此,发现和建模这些模式对于提高这种表示的识别性能至关重要。自从出现了大规模数据集,如 ImageNet 和 Places 以来,这些方法已经让位于功能更强大的卷积神经网络(CNNs),后者可以从数据中自动学习多层表示。在本文中,我们解决了原始 SM 方法和相关工作的许多局限性。我们使用神经网络提出了有区别的补丁表示,并进一步提出了一种混合架构,其中语义流形建立在多尺度 CNN 之上。这两种表示形式的计算速度都明显快于原始 SM 的高斯混合模型。为了结合多个尺度、空间关系和多个特征,我们使用马尔可夫随机场(MRF)构建丰富的上下文模型。为了解决优化问题,我们分析了全局和局部方法,其中自上而下的层次算法具有最佳性能。实验结果表明,联合利用不同类型的上下文关系可以一致地提高识别精度。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验