用于同时进行目标位姿估计和相机定位的多层次特征融合和联合细化。

Multi-level feature fusion and joint refinement for simultaneous object pose estimation and camera localization.

机构信息

School of Computer Science and Technology, Shandong University, Qingdao, China; Qingdao Research Institute of Beihang University, Qingdao, China.

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China; Qingdao Research Institute of Beihang University, Qingdao, China.

出版信息

Neural Netw. 2024 Jun;174:106238. doi: 10.1016/j.neunet.2024.106238. Epub 2024 Mar 16.

DOI:10.1016/j.neunet.2024.106238

PMID:38508048

Abstract

Object pose estimation and camera localization are critical in various applications. However, achieving algorithm universality, which refers to category-level pose estimation and scene-independent camera localization, presents challenges for both techniques. Although the two tasks keep close relationships due to spatial geometry constraints, different tasks require distinct feature extractions. This paper pays attention to a unified RGB-D based framework that simultaneously performs category-level object pose estimation and scene-independent camera localization. The framework consists of a pose estimation branch called SLO-ObjNet, a localization branch called SLO-LocNet, a pose confidence calculation process and object-level optimization. At the start, we obtain the initial camera and object results from SLO-LocNet and SLO-ObjNet. In these two networks, we design there-level feature fusion modules as well as the loss function to achieve feature sharing between two tasks. Then the proposed approach involves a confidence calculation process to determine the accuracy of object poses obtained. Additionally, an object-level Bundle Adjustment (BA) optimization algorithm is further used to improve the precision of these techniques. The BA algorithm establishes relationships among feature points, objects, and cameras with the usage of camera-point, camera-object, and object-point metrics. To evaluate the performance of this approach, experiments are conducted on localization and pose estimation datasets including REAL275, CAMERA25, LineMOD, YCB-Video, 7 Scenes, ScanNet and TUM RGB-D. The results show that this approach outperforms existing methods in terms of both estimation and localization accuracy. Additionally, SLO-LocNet and SLO-ObjNet are trained on ScanNet data and tested on 7 Scenes and TUM RGB-D datasets to demonstrate its universality performance. Finally, we also highlight the positive effects of fusion modules, loss function, confidence process and BA for improving overall performance.

摘要

物体姿态估计和相机定位在各种应用中至关重要。然而，实现算法通用性，即类别级别的姿态估计和与场景无关的相机定位，对这两种技术都提出了挑战。尽管由于空间几何约束，这两个任务保持着密切的关系，但不同的任务需要不同的特征提取。本文关注一种基于统一 RGB-D 的框架，该框架同时执行类别级别的物体姿态估计和与场景无关的相机定位。该框架由一个称为 SLO-ObjNet 的姿态估计分支、一个称为 SLO-LocNet 的定位分支、一个姿态置信度计算过程和物体级别的优化组成。首先，我们从 SLO-LocNet 和 SLO-ObjNet 中获得初始相机和物体结果。在这两个网络中，我们设计了三级特征融合模块以及损失函数，以实现两个任务之间的特征共享。然后，我们提出的方法涉及置信度计算过程，以确定所获得的物体姿态的准确性。此外，还进一步使用物体级别的束调整 (BA) 优化算法来提高这些技术的精度。BA 算法使用相机-点、相机-物体和物体-点度量在特征点、物体和相机之间建立关系。为了评估该方法的性能，我们在包括 REAL275、CAMERA25、LineMOD、YCB-Video、7 Scenes、ScanNet 和 TUM RGB-D 在内的定位和姿态估计数据集上进行了实验。结果表明，该方法在估计和定位精度方面都优于现有的方法。此外，SLO-LocNet 和 SLO-ObjNet 在 ScanNet 数据上进行训练，并在 7 Scenes 和 TUM RGB-D 数据集上进行测试，以证明其通用性。最后，我们还强调了融合模块、损失函数、置信度过程和 BA 对提高整体性能的积极影响。

相似文献

Multi-level feature fusion and joint refinement for simultaneous object pose estimation and camera localization.

Neural Netw. 2024 Jun;174:106238. doi: 10.1016/j.neunet.2024.106238. Epub 2024 Mar 16.

Category-Level 6-D Object Pose Estimation With Shape Deformation for Robotic Grasp Detection.

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):1857-1871. doi: 10.1109/TNNLS.2023.3330011. Epub 2025 Jan 7.

Enhancing object pose estimation for RGB images in cluttered scenes.

Sci Rep. 2025 Mar 13;15(1):8745. doi: 10.1038/s41598-025-90482-6.

A Manufacturing-Oriented Intelligent Vision System Based on Deep Neural Network for Object Recognition and 6D Pose Estimation.

Front Neurorobot. 2021 Jan 7;14:616775. doi: 10.3389/fnbot.2020.616775. eCollection 2020.

6-D Object Pose Estimation Based on Point Pair Matching for Robotic Grasp Detection.

IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):11902-11916. doi: 10.1109/TNNLS.2024.3442433.

DON6D: a decoupled one-stage network for 6D pose estimation.

Sci Rep. 2024 Apr 10;14(1):8410. doi: 10.1038/s41598-024-59152-x.

MSSPA-GC: Multi-Scale Shape Prior Adaptation with 3D Graph Convolutions for Category-Level Object Pose Estimation.

Neural Netw. 2023 Sep;166:609-621. doi: 10.1016/j.neunet.2023.07.037. Epub 2023 Jul 31.

RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization.

IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4669-4683. doi: 10.1109/TPAMI.2024.3360181. Epub 2024 Jun 5.

DOT-SLAM: A Stereo Visual Simultaneous Localization and Mapping (SLAM) System with Dynamic Object Tracking Based on Graph Optimization.

Sensors (Basel). 2024 Jul 18;24(14):4676. doi: 10.3390/s24144676.

Marker-Less 3d Object Recognition and 6d Pose Estimation for Homogeneous Textureless Objects: An RGB-D Approach.

Sensors (Basel). 2020 Sep 7;20(18):5098. doi: 10.3390/s20185098.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于同时进行目标位姿估计和相机定位的多层次特征融合和联合细化。

Multi-level feature fusion and joint refinement for simultaneous object pose estimation and camera localization.

机构信息

School of Computer Science and Technology, Shandong University, Qingdao, China; Qingdao Research Institute of Beihang University, Qingdao, China.

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China; Qingdao Research Institute of Beihang University, Qingdao, China.

出版信息

Neural Netw. 2024 Jun;174:106238. doi: 10.1016/j.neunet.2024.106238. Epub 2024 Mar 16.

DOI:10.1016/j.neunet.2024.106238

PMID:38508048

Abstract

摘要

用于同时进行目标位姿估计和相机定位的多层次特征融合和联合细化。

Multi-level feature fusion and joint refinement for simultaneous object pose estimation and camera localization.

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于同时进行目标位姿估计和相机定位的多层次特征融合和联合细化。

Multi-level feature fusion and joint refinement for simultaneous object pose estimation and camera localization.

机构信息

出版信息

相似文献