端到端隐式物体姿态估计

End-to-End Implicit Object Pose Estimation.

作者信息

Cao Chen, Yu Baocheng, Xu Wenxia, Chen Guojun, Ai Yuming

机构信息

School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430073, China.

出版信息

Sensors (Basel). 2024 Sep 3;24(17):5721. doi: 10.3390/s24175721.

DOI:10.3390/s24175721

PMID:39275632

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11398108/

Abstract

To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding-decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, bilinear sampling tends to sacrifice the accuracy of precise features. In our research, we propose a novel solution that utilizes implicit representation as a bridge between discrete feature maps and continuous feature maps. We represent the feature map as a coordinate field, where each coordinate pair corresponds to a feature value. These feature values are then used to estimate feature maps of arbitrary scales, replacing upsampling for decoding. We apply the proposed implicit module to a bidirectional fusion feature pyramid network. Based on this implicit module, we propose three network branches: a class estimation branch, a bounding box estimation branch, and the final pose estimation branch. For this pose estimation branch, we propose a miniature dual-stream network, which estimates object surface features and complements the relationship between 2D and 3D. We represent the rotation component using the SVD (Singular Value Decomposition) representation method, resulting in a more accurate object pose. We achieved satisfactory experimental results on the widely used 6D pose estimation benchmark dataset Linemod. This innovative approach provides a more convenient solution for 6D object pose estimation.

摘要

为了准确估计物体的6D姿态，大多数方法采用两阶段算法。虽然这种两阶段算法具有很高的精度，但它们通常速度较慢。此外，许多方法利用编码-解码来获得6D姿态，其中许多方法在解码时采用双线性采样。然而，双线性采样往往会牺牲精确特征的准确性。在我们的研究中，我们提出了一种新颖的解决方案，该方案利用隐式表示作为离散特征图和连续特征图之间的桥梁。我们将特征图表示为一个坐标场，其中每个坐标对对应一个特征值。然后，这些特征值用于估计任意尺度的特征图，取代了解码时的上采样。我们将所提出的隐式模块应用于双向融合特征金字塔网络。基于这个隐式模块，我们提出了三个网络分支：一个类别估计分支、一个边界框估计分支和最终的姿态估计分支。对于这个姿态估计分支，我们提出了一个微型双流网络，该网络估计物体表面特征并补充2D和3D之间的关系。我们使用奇异值分解（SVD）表示方法来表示旋转分量，从而得到更准确的物体姿态。我们在广泛使用的6D姿态估计基准数据集Linemod上取得了令人满意的实验结果。这种创新方法为6D物体姿态估计提供了一种更便捷的解决方案。