University of Science and Technology of China, Hefei, 230027, Anhui, China.
University of Science and Technology of China, Hefei, 230027, Anhui, China; Anhui Province Key Laboratory of Software in Computing and Communication, Hefei, 230027, Anhui, China; USTC-Deqing Alpha Innovation Research Institute, Huzhou, 313299, Zhejiang, China.
Neural Netw. 2023 Sep;166:609-621. doi: 10.1016/j.neunet.2023.07.037. Epub 2023 Jul 31.
Category-level object pose estimation aims to predict the 6D object pose and size of arbitrary objects from known categories. It remains a challenge due to the large intra-class shape variation. Recently, the introduction of the shape prior adaptation mechanism into the normalized canonical coordinates (i.e., NOCS) reconstruction process has been shown to be effective in mitigating the intra-class shape variation. However, existing shape prior adaptation methods simply map the observed point cloud to the normalized object space, and the extracted object descriptors are not sufficient for the perception of the object pose. As a result, they fail to predict the pose of objects with complex geometric structures (e.g., cameras). To this end, this paper proposes a novel shape prior adaption method named MSSPA-GC for category-level object pose estimation. Specifically, our main network takes the observed instance point cloud converted from the RGB-D image and the prior shape point cloud pre-trained on the object CAD models as inputs. Then, a novel 3D graph convolution network and a PointNet-like MLP network are designed to extract pose-aware object features and shape-aware object features from these two inputs, respectively. After that, the two-stream object features are aggregated through a multi-scale feature propagation mechanism to generate comprehensive 3D object descriptors that maintain both pose-sensitive geometric stability and intra-class shape consistency. Finally, by leveraging object descriptors aware of both object pose and shape when reconstructing the NOCS coordinates, our approach elegantly achieves state-of-the-art performance on the widely used REAL275 and CAMERA25 datasets using only 25% of the parameters compared with existing shape prior adaptation models. Moreover, our method also exhibits decent generalization ability on the unconstrained REDWOOD75 dataset.
类别级目标位姿估计旨在从已知类别中预测任意目标的 6D 目标位姿和大小。由于类内形状变化较大,这仍然是一个挑战。最近,将形状先验自适应机制引入归一化标准坐标(即 NOCS)重建过程中,已被证明可有效减轻类内形状变化。然而,现有的形状先验自适应方法只是将观测点云映射到归一化物体空间,并且提取的物体描述符不足以感知物体位姿。因此,它们无法预测具有复杂几何结构(例如相机)的物体的位姿。为此,本文提出了一种新的类别级目标位姿估计的形状先验自适应方法,名为 MSSPA-GC。具体来说,我们的主网络以从 RGB-D 图像转换的观测实例点云和在物体 CAD 模型上预先训练的先验形状点云作为输入。然后,设计了一种新颖的 3D 图卷积网络和一种类似于 PointNet 的 MLP 网络,分别从这两个输入中提取位姿感知物体特征和形状感知物体特征。之后,通过多尺度特征传播机制将两流物体特征聚合起来,生成同时保持位姿敏感几何稳定性和类内形状一致性的综合 3D 物体描述符。最后,通过在重建 NOCS 坐标时利用同时感知物体位姿和形状的物体描述符,我们的方法在仅使用现有形状先验自适应模型 25%参数的情况下,在广泛使用的 REAL275 和 CAMERA25 数据集上实现了最先进的性能。此外,我们的方法在不受约束的 REDWOOD75 数据集上也表现出了相当不错的泛化能力。