Yu Sheng, Zhai Di-Hua, Zhan Yufeng, Wang Wencai, Guan Yuyin, Xia Yuanqing
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):11902-11916. doi: 10.1109/TNNLS.2024.3442433.
The 6-D pose estimation is a critical work essential to achieve reliable robotic grasping. Currently, the prevalent method is reliant on keypoint correspondence. However, this approach hinges on the determination of object keypoint locations, alongside their detection and localization in real scenes. It also employs the random sample consensus (RANSAC)-based perspective-n-point (PnP) algorithm to solve the pose. Yet, it is nondifferentiable and incapable of backpropagation with loss during the training phase. Alternatively, the direct regression method, while speedy and differentiable, falls short in terms of pose estimation performance, and thus needs enhancement. In view of these gaps, we investigate PPM6D, a new method for 6-D object pose estimation based on regression and point pair matching. Our methodology begins with a proposed cross-fusion module, designed to achieve the fusion and complementation of RGB features and point cloud features. Subsequently, an attention module adjusts the features of the object's 3-D model. Finally, we design a point pair matching module for effective matching of points and characteristics, resulting in an integral matching and fusion. PPM6D is extensively trained and tested utilizing benchmark datasets like LINEMOD, occlusion LINEMOD (LINEMOD-occ), YCB-Video, and T-LESS dataset. Experimental results prove that PPM6D can outperform many keypoint-based pose estimation methods, given its relatively rapid speed, thereby offering novel regression-based pose estimation ideas. When applied to real-world scenarios of object pose estimation tasks and grasp tasks of an actual Baxter robot, PPM6D demonstrates superior performance as compared to most alternatives.
6D位姿估计是实现可靠机器人抓取的一项关键工作。目前,普遍采用的方法依赖于关键点对应。然而,这种方法取决于物体关键点位置的确定,以及它们在真实场景中的检测和定位。它还采用基于随机样本一致性(RANSAC)的透视n点(PnP)算法来求解位姿。然而,它不可微,在训练阶段无法进行带损失的反向传播。另外,直接回归方法虽然速度快且可微,但在位姿估计性能方面存在不足,因此需要改进。鉴于这些差距,我们研究了PPM6D,一种基于回归和点对匹配的6D物体位姿估计新方法。我们的方法首先提出了一个交叉融合模块,旨在实现RGB特征和点云特征的融合与互补。随后,一个注意力模块调整物体3D模型的特征。最后,我们设计了一个点对匹配模块,用于有效地匹配点和特征,从而实现整体匹配和融合。PPM6D利用LINEMOD、遮挡LINEMOD(LINEMOD-occ)、YCB-Video和T-LESS数据集等基准数据集进行了广泛的训练和测试。实验结果证明,PPM6D速度相对较快,能够优于许多基于关键点的位姿估计方法,从而提供了基于回归的位姿估计新思路。当应用于物体位姿估计任务的实际场景以及实际Baxter机器人的抓取任务时,PPM6D与大多数其他方法相比表现出卓越的性能。