IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5847-5865. doi: 10.1109/TPAMI.2021.3070754. Epub 2022 Aug 4.
We describe a learning-based system that estimates the camera position and orientation from a single input image relative to a known environment. The system is flexible w.r.t. the amount of information available at test and at training time, catering to different applications. Input images can be RGB-D or RGB, and a 3D model of the environment can be utilized for training but is not necessary. In the minimal case, our system requires only RGB images and ground truth poses at training time, and it requires only a single RGB image at test time. The framework consists of a deep neural network and fully differentiable pose optimization. The neural network predicts so called scene coordinates, i.e., dense correspondences between the input image and 3D scene space of the environment. The pose optimization implements robust fitting of pose parameters using differentiable RANSAC (DSAC) to facilitate end-to-end training. The system, an extension of DSAC++ and referred to as DSAC*, achieves state-of-the-art accuracy on various public datasets for RGB-based re-localization, and competitive accuracy for RGB-D based re-localization.
我们描述了一个基于学习的系统,该系统可以根据已知环境中的单个输入图像来估计相机的位置和方向。该系统在测试和训练时可灵活运用信息量,以适应不同的应用。输入图像可以是 RGB-D 或 RGB,并且可以利用环境的 3D 模型进行训练,但不是必需的。在最小的情况下,我们的系统只需要在训练时提供 RGB 图像和地面实况姿势,并且只需要在测试时提供单个 RGB 图像。该框架由一个深度神经网络和完全可微分的姿势优化组成。神经网络预测所谓的场景坐标,即输入图像和环境的 3D 场景空间之间的密集对应关系。姿势优化使用可微分 RANSAC(DSAC)实现姿势参数的稳健拟合,以促进端到端训练。该系统是 DSAC++的扩展,称为 DSAC*,在各种用于基于 RGB 的重新定位的公共数据集上实现了最先进的精度,并且在基于 RGB-D 的重新定位方面具有竞争力的精度。