School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China.
School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China.
Sensors (Basel). 2021 Mar 1;21(5):1692. doi: 10.3390/s21051692.
This paper focuses on 6Dof object pose estimation from a single RGB image. We tackle this challenging problem with a two-stage optimization framework. More specifically, we first introduce a translation estimation module to provide an initial translation based on an estimated depth map. Then, a pose regression module combines the ROI (Region of Interest) and the original image to predict the rotation and refine the translation. Compared with previous end-to-end methods that directly predict rotations and translations, our method can utilize depth information as weak guidance and significantly reduce the searching space for the subsequent module. Furthermore, we design a new loss function function for symmetric objects, an approach that has handled such exceptionally difficult cases in prior works. Experiments show that our model achieves state-of-the-art object pose estimation for the YCB- video dataset (Yale-CMU-Berkeley).
本文主要研究基于单张 RGB 图像的 6DoF 物体位姿估计。我们采用两阶段优化框架来解决这个具有挑战性的问题。具体来说,我们首先引入一个平移估计模块,基于估计的深度图提供初始平移。然后,位姿回归模块结合 ROI(感兴趣区域)和原始图像来预测旋转并精修平移。与之前直接预测旋转和平移的端到端方法相比,我们的方法可以利用深度信息作为弱引导,显著缩小后续模块的搜索空间。此外,我们为对称物体设计了新的损失函数,这是之前的工作中处理此类特殊困难情况的方法。实验表明,我们的模型在 YCB-video 数据集(耶鲁大学-卡内基梅隆大学-伯克利分校)上实现了最先进的物体位姿估计。