State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China.
EON Reality Pte Ltd, Singapore 138567, Singapore.
Sensors (Basel). 2019 Feb 21;19(4):893. doi: 10.3390/s19040893.
Environmental perception is a vital feature for service robots when working in an indoor environment for a long time. The general 3D reconstruction is a low-level geometric information description that cannot convey semantics. In contrast, higher level perception similar to humans requires more abstract concepts, such as objects and scenes. Moreover, the 2D object detection based on images always fails to provide the actual position and size of an object, which is quite important for a robot's operation. In this paper, we focus on the 3D object detection to regress the object's category, 3D size, and spatial position through a convolutional neural network (CNN). We propose a multi-channel CNN for 3D object detection, which fuses three input channels including RGB, depth, and bird's eye view (BEV) images. We also propose a method to generate 3D proposals based on 2D ones in the RGB image and semantic prior. Training and test are conducted on the modified NYU V2 dataset and SUN RGB-D dataset in order to verify the effectiveness of the algorithm. We also carry out the actual experiments in a service robot to utilize the proposed 3D object detection method to enhance the environmental perception of the robot.
环境感知是服务机器人在室内环境中长时间工作的一项重要特征。一般的 3D 重建是一种低级的几何信息描述,无法传达语义。相比之下,类似于人类的更高层次的感知需要更抽象的概念,例如物体和场景。此外,基于图像的 2D 目标检测总是无法提供物体的实际位置和大小,这对于机器人的操作非常重要。在本文中,我们专注于通过卷积神经网络(CNN)进行 3D 目标检测,以回归物体的类别、3D 大小和空间位置。我们提出了一种用于 3D 目标检测的多通道 CNN,该网络融合了包括 RGB、深度和俯视(BEV)图像在内的三个输入通道。我们还提出了一种基于 RGB 图像和语义先验生成 3D 提案的方法。我们在修改后的 NYU V2 数据集和 SUN RGB-D 数据集上进行了训练和测试,以验证算法的有效性。我们还在服务机器人中进行了实际实验,利用所提出的 3D 目标检测方法来增强机器人的环境感知能力。