通过对象可供性检测和意图语义提取实现与意图相关的自然语言基础

Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction.

作者信息

Mi Jinpeng, Liang Hongzhuo, Katsakis Nikolaos, Tang Song, Li Qingdu, Zhang Changshui, Zhang Jianwei

机构信息

Institute of Machine Intelligence (IMI), University of Shanghai for Science and Technology, Shanghai, China.

Technical Aspects of Multimodal Systems, Department of Informatics, University of Hamburg, Hamburg, Germany.

出版信息

Front Neurorobot. 2020 May 13;14:26. doi: 10.3389/fnbot.2020.00026. eCollection 2020.

DOI:10.3389/fnbot.2020.00026

PMID:32477091

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7238763/

Abstract

Similar to specific natural language instructions, intention-related natural language queries also play an essential role in our daily life communication. Inspired by the psychology term "affordance" and its applications in Human-Robot interaction, we propose an object affordance-based natural language visual grounding architecture to ground intention-related natural language queries. Formally, we first present an attention-based multi-visual features fusion network to detect object affordances from RGB images. While fusing deep visual features extracted from a pre-trained CNN model with deep texture features encoded by a deep texture encoding network, the presented object affordance detection network takes into account the interaction of the multi-visual features, and reserves the complementary nature of the different features by integrating attention weights learned from sparse representations of the multi-visual features. We train and validate the attention-based object affordance recognition network on a self-built dataset in which a large number of images originate from MSCOCO and ImageNet. Moreover, we introduce an intention semantic extraction module to extract intention semantics from intention-related natural language queries. Finally, we ground intention-related natural language queries by integrating the detected object affordances with the extracted intention semantics. We conduct extensive experiments to validate the performance of the object affordance detection network and the intention-related natural language queries grounding architecture.

摘要

与特定的自然语言指令类似，意图相关的自然语言查询在我们的日常生活交流中也起着至关重要的作用。受心理学术语“可供性”及其在人机交互中的应用启发，我们提出了一种基于物体可供性的自然语言视觉基础架构，用于对意图相关的自然语言查询进行基础定位。形式上，我们首先提出一个基于注意力的多视觉特征融合网络，以从RGB图像中检测物体可供性。在将从预训练的卷积神经网络（CNN）模型中提取的深度视觉特征与由深度纹理编码网络编码的深度纹理特征进行融合时，所提出的物体可供性检测网络考虑了多视觉特征的相互作用，并通过整合从多视觉特征的稀疏表示中学习到的注意力权重，保留了不同特征的互补性质。我们在一个自建数据集上训练和验证基于注意力的物体可供性识别网络，其中大量图像来自MSCOCO和ImageNet。此外，我们引入了一个意图语义提取模块，从意图相关的自然语言查询中提取意图语义。最后，我们通过将检测到的物体可供性与提取的意图语义相结合，对意图相关的自然语言查询进行基础定位。我们进行了广泛的实验，以验证物体可供性检测网络和意图相关的自然语言查询基础架构的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a0e/7238763/eaffc79a3e40/fnbot-14-00026-g0001.jpg

相似文献

Front Neurorobot. 2020 May 13;14:26. doi: 10.3389/fnbot.2020.00026. eCollection 2020.

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.

Front Neurorobot. 2020 Jun 25;14:43. doi: 10.3389/fnbot.2020.00043. eCollection 2020.

Knowledge enhanced bottom-up affordance grounding for robotic interaction.

PeerJ Comput Sci. 2024 Jul 5;10:e2097. doi: 10.7717/peerj-cs.2097. eCollection 2024.

Grounding human-object interaction to affordance behavior in multimodal datasets.

Front Artif Intell. 2023 Jan 30;6:1084740. doi: 10.3389/frai.2023.1084740. eCollection 2023.

Learning Visual Affordance Grounding From Demonstration Videos.

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16857-16871. doi: 10.1109/TNNLS.2023.3298638. Epub 2024 Oct 29.

Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera.

Sensors (Basel). 2016 Dec 13;16(12):2117. doi: 10.3390/s16122117.

The Affordance Directive: Affordance Priming Facilitates Object Detection Similar to Semantic Priming.

Psychol Rep. 2025 Jun;128(3):2021-2054. doi: 10.1177/00332941231174393. Epub 2023 May 10.

Computational mechanisms underlying cortical responses to the affordance properties of visual scenes.

PLoS Comput Biol. 2018 Apr 23;14(4):e1006111. doi: 10.1371/journal.pcbi.1006111. eCollection 2018 Apr.

Human-Object Interaction detection via Global Context and Pairwise-level Fusion Features Integration.

Neural Netw. 2024 Feb;170:242-253. doi: 10.1016/j.neunet.2023.11.002. Epub 2023 Nov 13.

Electrophysiological study of action-affordance priming between object names.

Brain Lang. 2018 Sep;184:20-31. doi: 10.1016/j.bandl.2018.06.002. Epub 2018 Jun 20.

引用本文的文献

Knowledge enhanced bottom-up affordance grounding for robotic interaction.

PeerJ Comput Sci. 2024 Jul 5;10:e2097. doi: 10.7717/peerj-cs.2097. eCollection 2024.

Coordinating Shared Tasks in Human-Robot Collaboration by Commands.

Front Robot AI. 2021 Oct 19;8:734548. doi: 10.3389/frobt.2021.734548. eCollection 2021.

本文引用的文献

Focal Loss for Dense Object Detection.

IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327. doi: 10.1109/TPAMI.2018.2858826. Epub 2018 Jul 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过对象可供性检测和意图语义提取实现与意图相关的自然语言基础

Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction.

作者信息

Mi Jinpeng, Liang Hongzhuo, Katsakis Nikolaos, Tang Song, Li Qingdu, Zhang Changshui, Zhang Jianwei

机构信息

Institute of Machine Intelligence (IMI), University of Shanghai for Science and Technology, Shanghai, China.

Technical Aspects of Multimodal Systems, Department of Informatics, University of Hamburg, Hamburg, Germany.

出版信息

Front Neurorobot. 2020 May 13;14:26. doi: 10.3389/fnbot.2020.00026. eCollection 2020.

DOI:10.3389/fnbot.2020.00026

PMID:32477091

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7238763/

Abstract

摘要

通过对象可供性检测和意图语义提取实现与意图相关的自然语言基础

Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过对象可供性检测和意图语义提取实现与意图相关的自然语言基础

Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献