使用RGB-D相机通过自然语言指令检测目标物体。

Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera.

作者信息

Bao Jiatong, Jia Yunyi, Cheng Yu, Tang Hongru, Xi Ning

机构信息

Department of Hydraulic, Energy and Power Engineering, Yangzhou University, Yangzhou 225000, China.

Department of Automotive Engineering, Clemson University, Greenville, SC 29607, USA.

出版信息

Sensors (Basel). 2016 Dec 13;16(12):2117. doi: 10.3390/s16122117.

DOI:10.3390/s16122117

PMID:27983604

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5191097/

Abstract

Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications.

摘要

通过自然语言（NL）控制机器人因其多功能性、便利性以及无需用户进行大量训练而越来越受到关注。基础接地是解决该问题的关键挑战，它能使机器人理解人类的自然语言指令。本文主要探讨目标基础接地问题，并具体研究在机器人操作应用中如何使用RGB-D相机通过自然语言指令检测目标物体。具体而言，应用一种简单而稳健的视觉算法来分割感兴趣的物体。利用所有分割物体的度量信息，进一步提取物体属性和物体之间的关系。将包含多个物体规格线索的自然语言指令解析为特定领域的注释。在计算状态估计框架中匹配来自自然语言的注释和从RGB-D相机提取的信息，以搜索所有可能的物体基础接地状态。通过选择具有最大概率的状态来完成最终的基础接地。收集了基于机器人不同认知水平与不同自然语言指令组相关联的RGB-D场景数据集。对该数据集的定量评估说明了所提方法的优势。使用移动操纵器进行自然语言控制的物体操纵和基于自然语言的任务编程实验表明了其在机器人应用中的有效性和实用性。