使用二维边界框估计6D物体姿态。

Estimation of 6D Object Pose Using a 2D Bounding Box.

作者信息

Hong Yong, Liu Jin, Jahangir Zahid, He Sheng, Zhang Qing

机构信息

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China.

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China.

出版信息

Sensors (Basel). 2021 Apr 22;21(9):2939. doi: 10.3390/s21092939.

DOI:10.3390/s21092939

PMID:33922124

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8122747/

Abstract

This paper provides an efficient way of addressing the problem of detecting or estimating the 6-Dimensional (6D) pose of objects from an RGB image. A quaternion is used to define an object's three-dimensional pose, but the pose represented by q and the pose represented by -q are equivalent, and the L2 loss between them is very large. Therefore, we define a new quaternion pose loss function to solve this problem. Based on this, we designed a new convolutional neural network named Q-Net to estimate an object's pose. Considering that the quaternion's output is a unit vector, a normalization layer is added in Q-Net to hold the output of pose on a four-dimensional unit sphere. We propose a new algorithm, called the Bounding Box Equation, to obtain 3D translation quickly and effectively from 2D bounding boxes. The algorithm uses an entirely new way of assessing the 3D rotation (R) and 3D translation rotation (t) in only one RGB image. This method can upgrade any traditional 2D-box prediction algorithm to a 3D prediction model. We evaluated our model using the LineMod dataset, and experiments have shown that our methodology is more acceptable and efficient in terms of L2 loss and computational time.

摘要

本文提供了一种有效的方法来解决从RGB图像中检测或估计物体六维（6D）姿态的问题。使用四元数来定义物体的三维姿态，但由q表示的姿态和由-q表示的姿态是等效的，并且它们之间的L2损失非常大。因此，我们定义了一种新的四元数姿态损失函数来解决这个问题。基于此，我们设计了一种名为Q-Net的新型卷积神经网络来估计物体的姿态。考虑到四元数的输出是一个单位向量，在Q-Net中添加了一个归一化层，以使姿态输出保持在四维单位球面上。我们提出了一种名为边界框方程的新算法，以从二维边界框快速有效地获得三维平移。该算法使用一种全新的方式，仅在一张RGB图像中评估三维旋转（R）和三维平移旋转（t）。这种方法可以将任何传统的二维框预测算法升级为三维预测模型。我们使用LineMod数据集对我们的模型进行了评估，实验表明，我们的方法在L2损失和计算时间方面更可接受且高效。