TU Kaiserslautern, 67663 Kaiserslautern, Germany.
German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany.
Sensors (Basel). 2021 Jan 5;21(1):300. doi: 10.3390/s21010300.
Estimation and tracking of 6DoF poses of objects in images is a challenging problem of great importance for robotic interaction and augmented reality. Recent approaches applying deep neural networks for pose estimation have shown encouraging results. However, most of them rely on training with real images of objects with severe limitations concerning ground truth pose acquisition, full coverage of possible poses, and training dataset scaling and generalization capability. This paper presents a novel approach using a Convolutional Neural Network (CNN) trained exclusively on single-channel Synthetic images of objects to regress 6DoF object Poses directly (). The proposed SynPo-Net is a network architecture specifically designed for pose regression and a proposed domain adaptation scheme transforming real and synthetic images into an intermediate domain that is better fit for establishing correspondences. The extensive evaluation shows that our approach significantly outperforms the state-of-the-art using synthetic training in terms of both accuracy and speed. Our system can be used to estimate the 6DoF pose from a single frame, or be integrated into a tracking system to provide the initial pose.
在图像中估计和跟踪物体的 6DoF 姿态是机器人交互和增强现实的一个重要难题。最近应用深度神经网络进行姿态估计的方法取得了令人鼓舞的结果。然而,大多数方法都依赖于对具有严重限制的真实物体图像进行训练,这些限制涉及真实姿态获取、可能姿态的完全覆盖以及训练数据集的扩展和泛化能力。本文提出了一种新颖的方法,该方法仅使用物体的单通道合成图像对卷积神经网络(CNN)进行训练,以直接回归 6DoF 物体姿态()。所提出的 SynPo-Net 是一种专门为姿态回归设计的网络架构,并提出了一种域自适应方案,将真实和合成图像转换为更适合建立对应关系的中间域。广泛的评估表明,我们的方法在使用合成训练方面在准确性和速度方面都明显优于最先进的方法。我们的系统可以用于从单个帧估计 6DoF 姿态,或者集成到跟踪系统中以提供初始姿态。