Zhang Liming, Zhou Xin, Liu Jiaqing, Wang Can, Wu Xinyu
Guangdong Provincial Key Lab of Robotics and Intelligent System, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
University of Science and Technology of China, Hefei, 230026, China.
Sci Rep. 2024 Apr 2;14(1):7801. doi: 10.1038/s41598-024-58590-x.
Six-dimensional pose estimation task predicts its 3D rotation matrix and 3D translation matrix in the world coordinate system by inputting the color image or depth image of the target object. Existing methods usually use deep neural networks to directly predict or regress object poses based on keypoint methods. The prediction results usually have deviations depending on whether the surface shape of the object is prominent or not and the size of the object. To solve this problem, we propose the six-dimensional pose estimation based on multi-task parameter sharing (PMP) framework to incorporate object category information into the pose estimation network through the form of an object classification auxiliary task. First, we extract the image features and point cloud features of the target object separately, and fuse them point by point; then, we share the confidence of each keypoint in pose estimation task and the knowledge of the classification task, get the key points with higher confidence, and predict the object pose; finally, the obtained object pose is passed through an iterative optimization network to obtain the final pose. The experimental results on the LineMOD dataset show that the proposed method can improve the accuracy of pose estimation and narrow the gap in the prediction accuracy of objects with different shapes. We also tested on a new dataset of small-scale objects, which contains object RGBD images and accurate 3D point cloud information. The proposed method is applied to the grasping experiment on the UR5 robotic arm, which satisfies the real-time pose estimation results during the grasping process.
六维姿态估计任务通过输入目标物体的彩色图像或深度图像,预测其在世界坐标系中的3D旋转矩阵和3D平移矩阵。现有方法通常使用深度神经网络基于关键点方法直接预测或回归物体姿态。预测结果通常会因物体表面形状是否突出以及物体大小而产生偏差。为了解决这个问题,我们提出了基于多任务参数共享(PMP)框架的六维姿态估计,通过物体分类辅助任务的形式将物体类别信息纳入姿态估计网络。首先,我们分别提取目标物体的图像特征和点云特征,并逐点融合;然后,我们在姿态估计任务中共享每个关键点的置信度和分类任务的知识,得到置信度更高的关键点,并预测物体姿态;最后,将得到的物体姿态通过迭代优化网络以获得最终姿态。在LineMOD数据集上的实验结果表明,该方法可以提高姿态估计的准确性,并缩小不同形状物体预测精度的差距。我们还在一个包含物体RGBD图像和精确3D点云信息的小型物体新数据集上进行了测试。所提出的方法应用于UR5机器人手臂的抓取实验,在抓取过程中满足实时姿态估计结果。