通过带有图像描述的弱监督学习进行分层场景解析

Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions.

作者信息

Zhang Ruimao, Lin Liang, Wang Guangrun, Wang Meng, Zuo Wangmeng

出版信息

IEEE Trans Pattern Anal Mach Intell. 2019 Mar;41(3):596-610. doi: 10.1109/TPAMI.2018.2799846. Epub 2018 Jan 30.

DOI:10.1109/TPAMI.2018.2799846

Abstract

This paper investigates a fundamental problem of scene understanding: how to parse a scene image into a structured configuration (i.e., a semantic object hierarchy with object interaction relations). We propose a deep architecture consisting of two networks: i) a convolutional neural network (CNN) extracting the image representation for pixel-wise object labeling and ii) a recursive neural network (RsNN) discovering the hierarchical object structure and the inter-object relations. Rather than relying on elaborative annotations (e.g., manually labeled semantic maps and relations), we train our deep model in a weakly-supervised learning manner by leveraging the descriptive sentences of the training images. Specifically, we decompose each sentence into a semantic tree consisting of nouns and verb phrases, and apply these tree structures to discover the configurations of the training images. Once these scene configurations are determined, then the parameters of both the CNN and RsNN are updated accordingly by back propagation. The entire model training is accomplished through an Expectation-Maximization method. Extensive experiments show that our model is capable of producing meaningful scene configurations and achieving more favorable scene labeling results on two benchmarks (i.e., PASCAL VOC 2012 and SYSU-Scenes) compared with other state-of-the-art weakly-supervised deep learning methods. In particular, SYSU-Scenes contains more than 5,000 scene images with their semantic sentence descriptions, which is created by us for advancing research on scene parsing.

摘要

本文研究场景理解的一个基本问题

如何将场景图像解析为结构化配置（即具有对象交互关系的语义对象层次结构）。我们提出了一种由两个网络组成的深度架构：i）一个卷积神经网络（CNN），用于提取图像表示以进行逐像素对象标注；ii）一个递归神经网络（RsNN），用于发现层次化对象结构和对象间关系。我们不是依赖详尽的注释（例如手动标注的语义图和关系），而是通过利用训练图像的描述性句子，以弱监督学习的方式训练我们的深度模型。具体来说，我们将每个句子分解为由名词和动词短语组成的语义树，并应用这些树结构来发现训练图像的配置。一旦确定了这些场景配置，那么CNN和RsNN的参数都会通过反向传播相应地更新。整个模型训练通过期望最大化方法完成。大量实验表明，与其他现有的弱监督深度学习方法相比，我们的模型能够生成有意义的场景配置，并在两个基准测试（即PASCAL VOC 2012和SYSU - Scenes）上取得更优的场景标注结果。特别是，SYSU - Scenes包含5000多张带有语义句子描述的场景图像，这是我们为推进场景解析研究而创建的。

相似文献

Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions.通过带有图像描述的弱监督学习进行分层场景解析

IEEE Trans Pattern Anal Mach Intell. 2019 Mar;41(3):596-610. doi: 10.1109/TPAMI.2018.2799846. Epub 2018 Jan 30.

Learning Hierarchical Space Tiling for Scene Modeling, Parsing and Attribute Tagging.学习层次空间平铺，用于场景建模、解析和属性标注。

IEEE Trans Pattern Anal Mach Intell. 2015 Dec;37(12):2478-91. doi: 10.1109/TPAMI.2015.2424880.

Group-Wise Learning for Weakly Supervised Semantic Segmentation.基于群体学习的弱监督语义分割。

IEEE Trans Image Process. 2022;31:799-811. doi: 10.1109/TIP.2021.3132834. Epub 2022 Jan 4.

Locally Supervised Deep Hybrid Model for Scene Recognition.用于场景识别的局部监督深度混合模型

IEEE Trans Image Process. 2017 Feb;26(2):808-820. doi: 10.1109/TIP.2016.2629443. Epub 2016 Nov 16.

On the Importance of Visual Context for Data Augmentation in Scene Understanding.视觉上下文在场景理解中对数据增强的重要性

IEEE Trans Pattern Anal Mach Intell. 2021 Jun;43(6):2014-2028. doi: 10.1109/TPAMI.2019.2961896. Epub 2021 May 11.

Deep Multiphase Level Set for Scene Parsing.用于场景解析的深度多相水平集

IEEE Trans Image Process. 2020 Feb 19. doi: 10.1109/TIP.2019.2957915.

STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation.STC：一种用于弱监督语义分割的从简单到复杂的框架。

IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2314-2320. doi: 10.1109/TPAMI.2016.2636150. Epub 2016 Dec 6.

Recursive segmentation and recognition templates for image parsing.递归分割和识别模板，用于图像解析。

IEEE Trans Pattern Anal Mach Intell. 2012 Feb;34(2):359-71. doi: 10.1109/TPAMI.2011.160.

Robust Scene Parsing by Mining Supportive Knowledge From Dataset.通过从数据集中挖掘支持性知识进行稳健的场景解析

IEEE Trans Neural Netw Learn Syst. 2023 May;34(5):2633-2646. doi: 10.1109/TNNLS.2021.3107194. Epub 2023 May 2.

Coarse-to-Fine Semantic Segmentation From Image-Level Labels.从图像级标签进行粗到细的语义分割。

IEEE Trans Image Process. 2020;29:225-236. doi: 10.1109/TIP.2019.2926748. Epub 2019 Jul 12.

引用本文的文献

Scene Classification in the Environmental Art Design by Using the Lightweight Deep Learning Model under the Background of Big Data.基于大数据的轻量化深度学习模型在环境艺术设计中的场景分类

Comput Intell Neurosci. 2022 Jun 13;2022:9066648. doi: 10.1155/2022/9066648. eCollection 2022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过带有图像描述的弱监督学习进行分层场景解析

Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions.

作者信息

出版信息

本文研究场景理解的一个基本问题

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献