Wang Zheng, Tu Hangyao, Qian Yutong, Zhao Yanwei
School of Computer and Computational Sciences, Hangzhou City University, Hangzhou, 310015, China.
School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310023, China.
Sci Rep. 2024 Apr 10;14(1):8410. doi: 10.1038/s41598-024-59152-x.
The six-dimensional (6D) pose object estimation is a key task in robotic manipulation and grasping scenes. Many existing two-stage solutions with a slow inference speed require extra refinement to handle the challenges of variations in lighting, sensor noise, object occlusion, and truncation. To address these challenges, this work proposes a decoupled one-stage network (DON6D) model for 6D pose estimation that improves inference speed on the premise of maintaining accuracy. Particularly, since the RGB images are aligned with the RGB-D images, the proposed DON6D first uses a two-dimensional detection network to locate the interested objects in RGB-D images. Then, a module of feature extraction and fusion is used to extract color and geometric features fully. Further, dual data augmentation is performed to enhance the generalization ability of the proposed model. Finally, the features are fused, and an attention residual encoder-decoder, which can improve the pose estimation performance to obtain an accurate 6D pose, is introduced. The proposed DON6D model is evaluated on the LINEMOD and YCB-Video datasets. The results demonstrate that the proposed DON6D is superior to several state-of-the-art methods regarding the ADD(-S) and ADD(-S) AUC metrics.
六维(6D)姿态目标估计是机器人操作和抓取场景中的一项关键任务。许多现有的两阶段解决方案推理速度较慢,需要额外的优化来应对光照变化、传感器噪声、目标遮挡和截断等挑战。为应对这些挑战,这项工作提出了一种用于6D姿态估计的解耦单阶段网络(DON6D)模型,该模型在保持准确性的前提下提高了推理速度。特别地,由于RGB图像与RGB-D图像对齐,所提出的DON6D首先使用二维检测网络在RGB-D图像中定位感兴趣的目标。然后,使用一个特征提取和融合模块来充分提取颜色和几何特征。此外,进行双重数据增强以提高所提模型的泛化能力。最后,融合特征,并引入一个注意力残差编码器-解码器,其可以提高姿态估计性能以获得准确的6D姿态。所提出的DON6D模型在LINEMOD和YCB-Video数据集上进行了评估。结果表明,在所提出的DON6D在ADD(-S)和ADD(-S) AUC指标方面优于几种现有最先进的方法。