State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China.
School of Physical Sciences, University of Science and Technology of China, Hefei 230026, China.
Sensors (Basel). 2019 Sep 21;19(19):4092. doi: 10.3390/s19194092.
To autonomously move and operate objects in cluttered indoor environments, a service robot requires the ability of 3D scene perception. Though 3D object detection can provide an object-level environmental description to fill this gap, a robot always encounters incomplete object observation, recurring detections of the same object, error in detection, or intersection between objects when conducting detection continuously in a cluttered room. To solve these problems, we propose a two-stage 3D object detection algorithm which is to fuse multiple views of 3D object point clouds in the first stage and to eliminate unreasonable and intersection detections in the second stage. For each view, the robot performs a 2D object semantic segmentation and obtains 3D object point clouds. Then, an unsupervised segmentation method called Locally Convex Connected Patches (LCCP) is utilized to segment the object accurately from the background. Subsequently, the Manhattan Frame estimation is implemented to calculate the main orientation of the object and subsequently, the 3D object bounding box can be obtained. To deal with the detected objects in multiple views, we construct an object database and propose an object fusion criterion to maintain it automatically. Thus, the same object observed in multi-view is fused together and a more accurate bounding box can be calculated. Finally, we propose an object filtering approach based on prior knowledge to remove incorrect and intersecting objects in the object dataset. Experiments are carried out on both SceneNN dataset and a real indoor environment to verify the stability and accuracy of 3D semantic segmentation and bounding box detection of the object with multi-view fusion.
为了在杂乱的室内环境中自主移动和操作物体,服务机器人需要具备 3D 场景感知能力。虽然 3D 目标检测可以提供对象级别的环境描述来填补这一空白,但机器人在杂乱的房间中连续进行检测时,总会遇到对象观察不完整、同一对象重复检测、检测错误或对象之间的交叉等问题。为了解决这些问题,我们提出了一种两阶段 3D 目标检测算法,该算法在第一阶段融合多个 3D 目标点云视图,并在第二阶段消除不合理和交叉检测。对于每个视图,机器人执行 2D 目标语义分割并获得 3D 目标点云。然后,使用一种名为局部凸连接补丁(LCCP)的无监督分割方法从背景中准确地分割出对象。随后,实施曼哈顿框架估计以计算对象的主要方向,随后可以获得 3D 对象边界框。为了处理多个视图中的检测对象,我们构建了一个对象数据库并提出了一个对象融合标准来自动维护它。因此,多视图中观察到的相同对象被融合在一起,可以计算出更准确的边界框。最后,我们提出了一种基于先验知识的对象过滤方法,以去除对象数据集中的错误和交叉对象。在 SceneNN 数据集和真实室内环境中进行了实验,以验证多视图融合的 3D 语义分割和边界框检测的稳定性和准确性。