图卷积网络在点云中进行 3D 物体位姿估计。

Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud.

机构信息

Department of Immersive Content Convergence, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Korea.

Department of Smart Convergence, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Korea.

出版信息

Sensors (Basel). 2022 Oct 25;22(21):8166. doi: 10.3390/s22218166.

DOI:10.3390/s22218166

PMID:36365864

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9656959/

Abstract

Graph Neural Networks (GNNs) are neural networks that learn the representation of nodes and associated edges that connect it to every other node while maintaining graph representation. Graph Convolutional Neural Networks (GCNs), as a representative method in GNNs, in the context of computer vision, utilize conventional Convolutional Neural Networks (CNNs) to process data supported by graphs. This paper proposes a one-stage GCN approach for 3D object detection and poses estimation by structuring non-linearly distributed points of a graph. Our network provides the required details to analyze, generate and estimate bounding boxes by spatially structuring the input data into graphs. Our method proposes a keypoint attention mechanism that aggregates the relative features between each point to estimate the category and pose of the object to which the vertices of the graph belong, and also designs nine degrees of freedom of multi-object pose estimation. In addition, to avoid gimbal lock in 3D space, we use quaternion rotation, instead of Euler angle. Experimental results showed that memory usage and efficiency could be improved by aggregating point features from the point cloud and their neighbors in a graph structure. Overall, the system achieved comparable performance against state-of-the-art systems.

摘要

图神经网络（GNN）是一种神经网络，它在保持图表示的同时，学习节点及其连接的边的表示。图卷积神经网络（GCN）作为 GNN 的一种代表性方法，在计算机视觉领域中利用传统卷积神经网络（CNN）处理基于图的支持数据。本文提出了一种基于图的非线性分布式点结构的用于 3D 目标检测和姿态估计的单阶段 GCN 方法。我们的网络通过将输入数据空间结构化为图，提供了分析、生成和估计边界框所需的详细信息。我们的方法提出了关键点注意机制，该机制聚合了每个点之间的相对特征，以估计属于图顶点的物体的类别和姿态，还设计了九自由度的多物体姿态估计。此外，为了避免 3D 空间中的万向锁问题，我们使用四元数旋转，而不是欧拉角。实验结果表明，通过在图结构中聚合点云及其邻居的点特征，可以提高内存使用效率和效率。总的来说，该系统的性能与最先进的系统相当。