基于统计注意力的类别级目标姿态估计

Category-Level Object Pose Estimation with Statistic Attention.

作者信息

Jiang Changhong, Mu Xiaoqiao, Zhang Bingbing, Liang Chao, Xie Mujun

机构信息

School of Electrical and Electronic Engineering, Changchun University of Technology, Changchun 130012, China.

School of Mechanical and Electrical Engineering, Changchun University of Technology, Changchun 130012, China.

出版信息

Sensors (Basel). 2024 Aug 19;24(16):5347. doi: 10.3390/s24165347.

DOI:10.3390/s24165347

PMID:39205041

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11359894/

Abstract

Six-dimensional object pose estimation is a fundamental problem in the field of computer vision. Recently, category-level object pose estimation methods based on 3D-GC have made significant breakthroughs due to advancements in 3D-GC. However, current methods often fail to capture long-range dependencies, which are crucial for modeling complex and occluded object shapes. Additionally, discerning detailed differences between different objects is essential. Some existing methods utilize self-attention mechanisms or Transformer encoder-decoder structures to address the lack of long-range dependencies, but they only focus on first-order information of features, failing to explore more complex information and neglecting detailed differences between objects. In this paper, we propose SAPENet, which follows the 3D-GC architecture but replaces the 3D-GC in the encoder part with HS-layer to extract features and incorporates statistical attention to compute higher-order statistical information. Additionally, three sub-modules are designed for pose regression, point cloud reconstruction, and bounding box voting. The pose regression module also integrates statistical attention to leverage higher-order statistical information for modeling geometric relationships and aiding regression. Experiments demonstrate that our method achieves outstanding performance, attaining an mAP of 49.5 on the 5°2 cm metric, which is 3.4 higher than the baseline model. Our method achieves state-of-the-art (SOTA) performance on the REAL275 dataset.

摘要

六维物体姿态估计是计算机视觉领域的一个基本问题。近年来，基于3D-GC的类别级物体姿态估计方法由于3D-GC的进展而取得了重大突破。然而，当前的方法往往无法捕捉长程依赖关系，而这对于建模复杂和被遮挡的物体形状至关重要。此外，区分不同物体之间的细微差别也很重要。一些现有方法利用自注意力机制或Transformer编码器-解码器结构来解决长程依赖关系的不足，但它们只关注特征的一阶信息，未能探索更复杂的信息，并且忽略了物体之间的细微差别。在本文中，我们提出了SAPENet，它遵循3D-GC架构，但在编码器部分用HS层替换3D-GC以提取特征，并结合统计注意力来计算高阶统计信息。此外，还设计了三个子模块用于姿态回归、点云重建和边界框投票。姿态回归模块还集成了统计注意力，以利用高阶统计信息来建模几何关系并辅助回归。实验表明，我们的方法取得了优异的性能，在5°2厘米度量标准下达到了49.5的平均精度均值（mAP），比基线模型高3.4。我们的方法在REAL275数据集上达到了当前最优（SOTA）性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于统计注意力的类别级目标姿态估计

Category-Level Object Pose Estimation with Statistic Attention.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

基于统计注意力的类别级目标姿态估计

Category-Level Object Pose Estimation with Statistic Attention.

作者信息

机构信息

出版信息

相似文献

本文引用的文献