Suppr超能文献

场景解析与参数和非参数模型的整合。

Scene Parsing With Integration of Parametric and Non-Parametric Models.

出版信息

IEEE Trans Image Process. 2016 May;25(5):2379-91. doi: 10.1109/TIP.2016.2533862.

Abstract

We adopt convolutional neural networks (CNNs) to be our parametric model to learn discriminative features and classifiers for local patch classification. Based on the occurrence frequency distribution of classes, an ensemble of CNNs (CNN-Ensemble) are learned, in which each CNN component focuses on learning different and complementary visual patterns. The local beliefs of pixels are output by CNN-Ensemble. Considering that visually similar pixels are indistinguishable under local context, we leverage the global scene semantics to alleviate the local ambiguity. The global scene constraint is mathematically achieved by adding a global energy term to the labeling energy function, and it is practically estimated in a non-parametric framework. A large margin-based CNN metric learning method is also proposed for better global belief estimation. In the end, the integration of local and global beliefs gives rise to the class likelihood of pixels, based on which maximum marginal inference is performed to generate the label prediction maps. Even without any post-processing, we achieve the state-of-the-art results on the challenging SiftFlow and Barcelona benchmarks.

摘要

我们采用卷积神经网络(CNNs)作为参数模型,用于学习局部补丁分类的判别特征和分类器。基于类的出现频率分布,学习了一组卷积神经网络(CNN-Ensemble),其中每个 CNN 组件都专注于学习不同的、互补的视觉模式。局部像素的置信度由 CNN-Ensemble 输出。考虑到在局部上下文下视觉相似的像素是不可区分的,我们利用全局场景语义来减轻局部歧义。全局场景约束通过在标记能量函数中添加全局能量项来在数学上实现,并且在非参数框架中进行实际估计。还提出了一种基于大间隔的 CNN 度量学习方法,以更好地进行全局置信度估计。最后,局部和全局置信度的融合产生了像素的类似然度,在此基础上进行最大边缘推理,生成标签预测图。即使没有任何后处理,我们在具有挑战性的 SiftFlow 和 Barcelona 基准测试中也取得了最先进的结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验