Qu Wen, Li Xiao, Jin Xiao
Computer Science and Technology, Dalian Martime University, Dalian, Liaoning, China.
PeerJ Comput Sci. 2024 Jul 5;10:e2097. doi: 10.7717/peerj-cs.2097. eCollection 2024.
With the rapid advancement of robotics technology, an increasing number of researchers are exploring the use of natural language as a communication channel between humans and robots. In scenarios where language conditioned manipulation grounding, prevailing methods rely heavily on supervised multimodal deep learning. In this paradigm, robots assimilate knowledge from both language instructions and visual input. However, these approaches lack external knowledge for comprehending natural language instructions and are hindered by the substantial demand for a large amount of paired data, where vision and language are usually linked through manual annotation for the creation of realistic datasets. To address the above problems, we propose the knowledge enhanced bottom-up affordance grounding network (KBAG-Net), which enhances natural language understanding through external knowledge, improving accuracy in object grasping affordance segmentation. In addition, we introduce a semi-automatic data generation method aimed at facilitating the quick establishment of the language following manipulation grounding dataset. The experimental results on two standard dataset demonstrate that our method outperforms existing methods with the external knowledge. Specifically, our method outperforms the two-stage method by 12.98% and 1.22% of mIoU on the two dataset, respectively. For broader community engagement, we will make the semi-automatic data construction method publicly available at https://github.com/wmqu/Automated-Dataset-Construction4LGM.
随着机器人技术的迅速发展,越来越多的研究人员正在探索将自然语言作为人与机器人之间的通信渠道。在语言条件操纵基础的场景中,主流方法严重依赖监督式多模态深度学习。在这种范式下,机器人从语言指令和视觉输入中吸收知识。然而,这些方法缺乏用于理解自然语言指令的外部知识,并且受到对大量配对数据的巨大需求的阻碍,在创建现实数据集时,视觉和语言通常通过人工标注联系起来。为了解决上述问题,我们提出了知识增强的自底向上的可供性基础网络(KBAG-Net),它通过外部知识增强自然语言理解,提高物体抓取可供性分割的准确性。此外,我们引入了一种半自动数据生成方法,旨在促进语言跟随操纵基础数据集的快速建立。在两个标准数据集上的实验结果表明,我们的方法在使用外部知识时优于现有方法。具体而言,我们的方法在两个数据集上分别比两阶段方法的平均交并比(mIoU)高出12.98%和1.22%。为了更广泛地促进社区参与,我们将在https://github.com/wmqu/Automated-Dataset-Construction4LGM上公开半自动数据构建方法。