通过指称表达理解和场景图解析实现交互式自然语言基础

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.

作者信息

Mi Jinpeng, Lyu Jianzhi, Tang Song, Li Qingdu, Zhang Jianwei

机构信息

Institute of Machine Intelligence (IMI), University of Shanghai for Science and Technology, Shanghai, China.

Technical Aspects of Multimodal Systems, Department of Informatics, University of Hamburg, Hamburg, Germany.

出版信息

Front Neurorobot. 2020 Jun 25;14:43. doi: 10.3389/fnbot.2020.00043. eCollection 2020.

DOI:10.3389/fnbot.2020.00043

PMID:32670046

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7331387/

Abstract

Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural language queries and achieve target objects grounding via dialogue systems, which make the interactions cumbersome and time-consuming. In contrast, we address interactive natural language grounding without auxiliary information. Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network. Furthermore, we combine the referring expression comprehension network with scene graph parsing to achieve unrestricted and complicated natural language grounding. Finally, we validate the performance of the referring expression comprehension network on three public datasets, and we also evaluate the effectiveness of the interactive natural language grounding architecture by conducting extensive natural language query groundings in different household scenarios.

摘要

自然语言为人与机器人之间提供了直观且有效的交互界面。当前，人们提出了多种方法来解决人机交互中的自然语言视觉基础问题。然而，现有的大多数方法处理自然语言查询的模糊性，并通过对话系统实现目标对象的基础定位，这使得交互变得繁琐且耗时。相比之下，我们在没有辅助信息的情况下解决交互式自然语言基础问题。具体而言，我们首先提出一个指代表达理解网络来定位自然指代表达。指代表达理解网络通过视觉语义感知网络挖掘视觉语义，并通过语言注意力网络利用表达式中丰富的语言上下文。此外，我们将指代表达理解网络与场景图解析相结合，以实现无限制且复杂的自然语言基础定位。最后，我们在三个公共数据集上验证了指代表达理解网络的性能，并且还通过在不同家庭场景中进行广泛的自然语言查询基础定位来评估交互式自然语言基础定位架构的有效性。

相似文献

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.通过指称表达理解和场景图解析实现交互式自然语言基础

Front Neurorobot. 2020 Jun 25;14:43. doi: 10.3389/fnbot.2020.00043. eCollection 2020.

Relationship-Embedded Representation Learning for Grounding Referring Expressions.用于基础指代表达的关系嵌入表示学习

IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2765-2779. doi: 10.1109/TPAMI.2020.2973983. Epub 2021 Jul 1.

Front Neurorobot. 2020 May 13;14:26. doi: 10.3389/fnbot.2020.00026. eCollection 2020.

Unambiguous Scene Text Segmentation with Referring Expression Comprehension.结合指代表达理解的明确场景文本分割

IEEE Trans Image Process. 2019 Jul 26. doi: 10.1109/TIP.2019.2930176.

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions.变分上下文：利用视觉和文本上下文进行指称表达式的定位

IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):347-359. doi: 10.1109/TPAMI.2019.2926266. Epub 2020 Dec 4.

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.用于组合式时间定位的变分跨图推理与自适应结构化语义学习

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12601-12617. doi: 10.1109/TPAMI.2023.3274139. Epub 2023 Sep 5.

Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding.基于实体增强的自适应重构网络用于弱监督的指物定位

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3003-3018. doi: 10.1109/TPAMI.2022.3186410. Epub 2023 Feb 3.

PiGLET: Pixel-Level Grounding of Language Expressions With Transformers.PiGLET：基于Transformer的语言表达像素级定位

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12206-12221. doi: 10.1109/TPAMI.2023.3286760. Epub 2023 Sep 5.

Cross-Modal Progressive Comprehension for Referring Segmentation.跨模态递进式理解的指代分割。

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4761-4775. doi: 10.1109/TPAMI.2021.3079993. Epub 2022 Aug 4.

Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding.用于弱监督时间语言定位的面向事件的状态对齐网络。

Entropy (Basel). 2024 Aug 27;26(9):730. doi: 10.3390/e26090730.

引用本文的文献

Knowledge enhanced bottom-up affordance grounding for robotic interaction.用于机器人交互的知识增强型自下而上的可供性基础。

PeerJ Comput Sci. 2024 Jul 5;10:e2097. doi: 10.7717/peerj-cs.2097. eCollection 2024.

本文引用的文献

Semantic Mapping Based on Spatial Concepts for Grounding Words Related to Places in Daily Environments.基于空间概念的语义映射，用于关联日常环境中与地点相关的词汇

Front Robot AI. 2019 May 28;6:31. doi: 10.3389/frobt.2019.00031. eCollection 2019.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过指称表达理解和场景图解析实现交互式自然语言基础

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献