用于机器人交互的知识增强型自下而上的可供性基础。

Knowledge enhanced bottom-up affordance grounding for robotic interaction.

作者信息

Qu Wen, Li Xiao, Jin Xiao

机构信息

Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China.

出版信息

PeerJ Comput Sci. 2024 Jul 5;10:e2097. doi: 10.7717/peerj-cs.2097. eCollection 2024.

DOI:10.7717/peerj-cs.2097

PMID:38983207

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11232630/

Abstract

With the rapid advancement of robotics technology, an increasing number of researchers are exploring the use of natural language as a communication channel between humans and robots. In scenarios where language conditioned manipulation grounding, prevailing methods rely heavily on supervised multimodal deep learning. In this paradigm, robots assimilate knowledge from both language instructions and visual input. However, these approaches lack external knowledge for comprehending natural language instructions and are hindered by the substantial demand for a large amount of paired data, where vision and language are usually linked through manual annotation for the creation of realistic datasets. To address the above problems, we propose the knowledge enhanced bottom-up affordance grounding network (KBAG-Net), which enhances natural language understanding through external knowledge, improving accuracy in object grasping affordance segmentation. In addition, we introduce a semi-automatic data generation method aimed at facilitating the quick establishment of the language following manipulation grounding dataset. The experimental results on two standard dataset demonstrate that our method outperforms existing methods with the external knowledge. Specifically, our method outperforms the two-stage method by 12.98% and 1.22% of mIoU on the two dataset, respectively. For broader community engagement, we will make the semi-automatic data construction method publicly available at https://github.com/wmqu/Automated-Dataset-Construction4LGM.

摘要

随着机器人技术的迅速发展，越来越多的研究人员正在探索将自然语言作为人与机器人之间的通信渠道。在语言条件操纵基础的场景中，主流方法严重依赖监督式多模态深度学习。在这种范式下，机器人从语言指令和视觉输入中吸收知识。然而，这些方法缺乏用于理解自然语言指令的外部知识，并且受到对大量配对数据的巨大需求的阻碍，在创建现实数据集时，视觉和语言通常通过人工标注联系起来。为了解决上述问题，我们提出了知识增强的自底向上的可供性基础网络（KBAG-Net），它通过外部知识增强自然语言理解，提高物体抓取可供性分割的准确性。此外，我们引入了一种半自动数据生成方法，旨在促进语言跟随操纵基础数据集的快速建立。在两个标准数据集上的实验结果表明，我们的方法在使用外部知识时优于现有方法。具体而言，我们的方法在两个数据集上分别比两阶段方法的平均交并比（mIoU）高出12.98%和1.22%。为了更广泛地促进社区参与，我们将在https://github.com/wmqu/Automated-Dataset-Construction4LGM上公开半自动数据构建方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9bb0/11232630/37b1f481462a/peerj-cs-10-2097-g001.jpg

相似文献

Knowledge enhanced bottom-up affordance grounding for robotic interaction.用于机器人交互的知识增强型自下而上的可供性基础。

PeerJ Comput Sci. 2024 Jul 5;10:e2097. doi: 10.7717/peerj-cs.2097. eCollection 2024.

Front Neurorobot. 2020 May 13;14:26. doi: 10.3389/fnbot.2020.00026. eCollection 2020.

Learning Visual Affordance Grounding From Demonstration Videos.从演示视频中学习视觉功能基础

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16857-16871. doi: 10.1109/TNNLS.2023.3298638. Epub 2024 Oct 29.

Grounding human-object interaction to affordance behavior in multimodal datasets.将人机交互基于多模态数据集中的可供性（affordance）行为。

Front Artif Intell. 2023 Jan 30;6:1084740. doi: 10.3389/frai.2023.1084740. eCollection 2023.

Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera.使用RGB-D相机通过自然语言指令检测目标物体。

Sensors (Basel). 2016 Dec 13;16(12):2117. doi: 10.3390/s16122117.

Affordance Equivalences in Robotics: A Formalism.机器人技术中的可供性等效性：一种形式主义

Front Neurorobot. 2018 Jun 8;12:26. doi: 10.3389/fnbot.2018.00026. eCollection 2018.

Affordance embeddings for situated language understanding.用于情境语言理解的可供性嵌入。

Front Artif Intell. 2022 Sep 23;5:774752. doi: 10.3389/frai.2022.774752. eCollection 2022.

Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding.用于弱监督时间语言定位的面向事件的状态对齐网络。

Entropy (Basel). 2024 Aug 27;26(9):730. doi: 10.3390/e26090730.

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.通过指称表达理解和场景图解析实现交互式自然语言基础

Front Neurorobot. 2020 Jun 25;14:43. doi: 10.3389/fnbot.2020.00043. eCollection 2020.

Detection, segmentation, and 3D pose estimation of surgical tools using convolutional neural networks and algebraic geometry.使用卷积神经网络和代数几何进行手术工具的检测、分割和三维姿态估计。

Med Image Anal. 2021 May;70:101994. doi: 10.1016/j.media.2021.101994. Epub 2021 Feb 7.

本文引用的文献

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.通过指称表达理解和场景图解析实现交互式自然语言基础

Front Neurorobot. 2020 Jun 25;14:43. doi: 10.3389/fnbot.2020.00043. eCollection 2020.

Front Neurorobot. 2020 May 13;14:26. doi: 10.3389/fnbot.2020.00026. eCollection 2020.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.DeepLab：基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。

IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于机器人交互的知识增强型自下而上的可供性基础。

Knowledge enhanced bottom-up affordance grounding for robotic interaction.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献