Suppr超能文献

用于机器人交互的知识增强型自下而上的可供性基础。

Knowledge enhanced bottom-up affordance grounding for robotic interaction.

作者信息

Qu Wen, Li Xiao, Jin Xiao

机构信息

Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China.

出版信息

PeerJ Comput Sci. 2024 Jul 5;10:e2097. doi: 10.7717/peerj-cs.2097. eCollection 2024.

Abstract

With the rapid advancement of robotics technology, an increasing number of researchers are exploring the use of natural language as a communication channel between humans and robots. In scenarios where language conditioned manipulation grounding, prevailing methods rely heavily on supervised multimodal deep learning. In this paradigm, robots assimilate knowledge from both language instructions and visual input. However, these approaches lack external knowledge for comprehending natural language instructions and are hindered by the substantial demand for a large amount of paired data, where vision and language are usually linked through manual annotation for the creation of realistic datasets. To address the above problems, we propose the knowledge enhanced bottom-up affordance grounding network (KBAG-Net), which enhances natural language understanding through external knowledge, improving accuracy in object grasping affordance segmentation. In addition, we introduce a semi-automatic data generation method aimed at facilitating the quick establishment of the language following manipulation grounding dataset. The experimental results on two standard dataset demonstrate that our method outperforms existing methods with the external knowledge. Specifically, our method outperforms the two-stage method by 12.98% and 1.22% of mIoU on the two dataset, respectively. For broader community engagement, we will make the semi-automatic data construction method publicly available at https://github.com/wmqu/Automated-Dataset-Construction4LGM.

摘要

随着机器人技术的迅速发展,越来越多的研究人员正在探索将自然语言作为人与机器人之间的通信渠道。在语言条件操纵基础的场景中,主流方法严重依赖监督式多模态深度学习。在这种范式下,机器人从语言指令和视觉输入中吸收知识。然而,这些方法缺乏用于理解自然语言指令的外部知识,并且受到对大量配对数据的巨大需求的阻碍,在创建现实数据集时,视觉和语言通常通过人工标注联系起来。为了解决上述问题,我们提出了知识增强的自底向上的可供性基础网络(KBAG-Net),它通过外部知识增强自然语言理解,提高物体抓取可供性分割的准确性。此外,我们引入了一种半自动数据生成方法,旨在促进语言跟随操纵基础数据集的快速建立。在两个标准数据集上的实验结果表明,我们的方法在使用外部知识时优于现有方法。具体而言,我们的方法在两个数据集上分别比两阶段方法的平均交并比(mIoU)高出12.98%和1.22%。为了更广泛地促进社区参与,我们将在https://github.com/wmqu/Automated-Dataset-Construction4LGM上公开半自动数据构建方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9bb0/11232630/37b1f481462a/peerj-cs-10-2097-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验