State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.
Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China.
Sensors (Basel). 2019 Feb 28;19(5):1032. doi: 10.3390/s19051032.
In recent years, estimating the 6D pose of object instances with convolutional neural network (CNN) has received considerable attention. Depending on whether intermediate cues are used, the relevant literature can be roughly divided into two broad categories: direct methods and two-stage pipelines. For the latter, intermediate cues, such as 3D object coordinates, semantic keypoints, or virtual control points instead of pose parameters are regressed by CNN in the first stage. Object pose can then be solved by correspondence constraints constructed with these intermediate cues. In this paper, we focus on the postprocessing of a two-stage pipeline and propose to combine two learning concepts for estimating object pose under challenging scenes: projection grouping on one side, and correspondence learning on the other. We firstly employ a local-patch based method to predict projection heatmaps which denote the confidence distribution of projection of 3D bounding box's corners. A projection grouping module is then proposed to remove redundant local maxima from each layer of heatmaps. Instead of directly feeding 2D⁻3D correspondences to the perspective-n-point (PnP) algorithm, multiple correspondence hypotheses are sampled from local maxima and its corresponding neighborhood and ranked by a correspondence⁻evaluation network. Finally, correspondences with higher confidence are selected to determine object pose. Extensive experiments on three public datasets demonstrate that the proposed framework outperforms several state of the art methods.
近年来,使用卷积神经网络 (CNN) 估计物体实例的 6D 姿态受到了广泛关注。根据是否使用中间线索,可以将相关文献大致分为两类:直接方法和两阶段管道。对于后者,中间线索(如 3D 物体坐标、语义关键点或虚拟控制点)而不是姿态参数,由 CNN 在第一阶段回归。然后可以通过使用这些中间线索构建的对应关系约束来解决物体姿态。在本文中,我们专注于两阶段管道的后处理,并提出了两种用于在具有挑战性的场景下估计物体姿态的学习概念:一方面是投影分组,另一方面是对应学习。我们首先采用基于局部补丁的方法来预测投影热力图,该热力图表示 3D 边界框角的投影置信度分布。然后提出了一个投影分组模块,从每个热力图层中去除冗余的局部极大值。我们不是直接将 2D-3D 对应关系馈送到透视 N 点 (PnP) 算法中,而是从局部极大值及其对应邻域中采样多个对应假设,并由对应评估网络对其进行排序。最后,选择置信度更高的对应关系来确定物体姿态。在三个公共数据集上的广泛实验表明,所提出的框架优于几种最先进的方法。