Suppr超能文献

密集融合-DA2:基于RGB-D传感器和多通道注意力机制的端到端姿态估计网络

DenseFusion-DA2: End-to-End Pose-Estimation Network Based on RGB-D Sensors and Multi-Channel Attention Mechanisms.

作者信息

Li Hanqi, Wan Guoyang, Li Xuna, Wang Chengwen, Zhang Hong, Liu Bingyou

机构信息

Department of Electrical Engineering, Anhui Polytechnic University, Beijing Road No. 8, Wuhu 241000, China.

出版信息

Sensors (Basel). 2024 Oct 15;24(20):6643. doi: 10.3390/s24206643.

Abstract

Notably, 6D pose estimation is a critical technology that enables robotics to perceive and interact with their operational environment. However, occlusion causes a loss of local features, which, in turn, restricts the estimation accuracy. To address these challenges, this paper proposes an end-to-end pose-estimation network based on a multi-channel attention mechanism, DA2Net. Firstly, a multi-channel attention mechanism, designated as "DA2Net", was devised using A-Nets as its foundation. This mechanism is constructed in two steps. In the first step, the essential characteristics are extracted from the global feature space through the second-order attention pool. In the second step, a feature map is generated by the integration of position and channel attention. Subsequently, the extracted key features are assigned to each position of the feature map, enhancing both the feature representation capacity and the overall performance. Secondly, the designed attention mechanism is introduced into both the feature fusion and pose iterative refinement networks to enhance the network's capacity to acquire local features thus improving its overall performance. The experimental results demonstrated that the estimation accuracy of DenseFusion-DA2 on the LineMOD dataset was approximately 3.4% higher than that of DenseFusion. Furthermore, the estimation accuracy surpassed that of PoseCNN, PVNet, SSD6D, and PointFusion by 8.3%, 11.1%, 20.3%, and 23.8%, respectively. The estimation accuracy also shows a significant advantage on the Occluded LineMOD and HR-Vision datasets. This research not only presents a more efficient solution for robot perception but also introduces novel ideas and methods for technological advancements and applications in related fields.

摘要

值得注意的是,6D姿态估计是一项关键技术,它使机器人能够感知其操作环境并与之交互。然而,遮挡会导致局部特征的丢失,进而限制估计精度。为应对这些挑战,本文提出了一种基于多通道注意力机制的端到端姿态估计网络——DA2Net。首先,以A-Nets为基础设计了一种名为“DA2Net”的多通道注意力机制。该机制分两步构建。第一步,通过二阶注意力池从全局特征空间中提取本质特征。第二步,通过位置注意力和通道注意力的整合生成特征图。随后,将提取的关键特征分配到特征图的每个位置,增强了特征表示能力和整体性能。其次,将设计好的注意力机制引入特征融合网络和姿态迭代细化网络中,以增强网络获取局部特征的能力,从而提高其整体性能。实验结果表明,DenseFusion-DA2在LineMOD数据集上的估计精度比DenseFusion高出约3.4%。此外,其估计精度分别比PoseCNN、PVNet、SSD6D和PointFusion高出8.3%、11.1%、20.3%和23.8%。在Occluded LineMOD和HR-Vision数据集上,该估计精度也显示出显著优势。本研究不仅为机器人感知提供了一种更有效的解决方案,还为相关领域的技术进步和应用引入了新的思路和方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea39/11511249/697f46f8dfa0/sensors-24-06643-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验