Suppr超能文献

通过指称表达理解和场景图解析实现交互式自然语言基础

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.

作者信息

Mi Jinpeng, Lyu Jianzhi, Tang Song, Li Qingdu, Zhang Jianwei

机构信息

Institute of Machine Intelligence (IMI), University of Shanghai for Science and Technology, Shanghai, China.

Technical Aspects of Multimodal Systems, Department of Informatics, University of Hamburg, Hamburg, Germany.

出版信息

Front Neurorobot. 2020 Jun 25;14:43. doi: 10.3389/fnbot.2020.00043. eCollection 2020.

Abstract

Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural language queries and achieve target objects grounding via dialogue systems, which make the interactions cumbersome and time-consuming. In contrast, we address interactive natural language grounding without auxiliary information. Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network. Furthermore, we combine the referring expression comprehension network with scene graph parsing to achieve unrestricted and complicated natural language grounding. Finally, we validate the performance of the referring expression comprehension network on three public datasets, and we also evaluate the effectiveness of the interactive natural language grounding architecture by conducting extensive natural language query groundings in different household scenarios.

摘要

自然语言为人与机器人之间提供了直观且有效的交互界面。当前,人们提出了多种方法来解决人机交互中的自然语言视觉基础问题。然而,现有的大多数方法处理自然语言查询的模糊性,并通过对话系统实现目标对象的基础定位,这使得交互变得繁琐且耗时。相比之下,我们在没有辅助信息的情况下解决交互式自然语言基础问题。具体而言,我们首先提出一个指代表达理解网络来定位自然指代表达。指代表达理解网络通过视觉语义感知网络挖掘视觉语义,并通过语言注意力网络利用表达式中丰富的语言上下文。此外,我们将指代表达理解网络与场景图解析相结合,以实现无限制且复杂的自然语言基础定位。最后,我们在三个公共数据集上验证了指代表达理解网络的性能,并且还通过在不同家庭场景中进行广泛的自然语言查询基础定位来评估交互式自然语言基础定位架构的有效性。

相似文献

1
Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.
Front Neurorobot. 2020 Jun 25;14:43. doi: 10.3389/fnbot.2020.00043. eCollection 2020.
2
Relationship-Embedded Representation Learning for Grounding Referring Expressions.
IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2765-2779. doi: 10.1109/TPAMI.2020.2973983. Epub 2021 Jul 1.
3
Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction.
Front Neurorobot. 2020 May 13;14:26. doi: 10.3389/fnbot.2020.00026. eCollection 2020.
4
Unambiguous Scene Text Segmentation with Referring Expression Comprehension.
IEEE Trans Image Process. 2019 Jul 26. doi: 10.1109/TIP.2019.2930176.
5
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions.
IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):347-359. doi: 10.1109/TPAMI.2019.2926266. Epub 2020 Dec 4.
6
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12601-12617. doi: 10.1109/TPAMI.2023.3274139. Epub 2023 Sep 5.
7
Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding.
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3003-3018. doi: 10.1109/TPAMI.2022.3186410. Epub 2023 Feb 3.
8
PiGLET: Pixel-Level Grounding of Language Expressions With Transformers.
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12206-12221. doi: 10.1109/TPAMI.2023.3286760. Epub 2023 Sep 5.
9
Cross-Modal Progressive Comprehension for Referring Segmentation.
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4761-4775. doi: 10.1109/TPAMI.2021.3079993. Epub 2022 Aug 4.
10
Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding.
Entropy (Basel). 2024 Aug 27;26(9):730. doi: 10.3390/e26090730.

引用本文的文献

1
Knowledge enhanced bottom-up affordance grounding for robotic interaction.
PeerJ Comput Sci. 2024 Jul 5;10:e2097. doi: 10.7717/peerj-cs.2097. eCollection 2024.

本文引用的文献

1
Semantic Mapping Based on Spatial Concepts for Grounding Words Related to Places in Daily Environments.
Front Robot AI. 2019 May 28;6:31. doi: 10.3389/frobt.2019.00031. eCollection 2019.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验