MS3D：一种使用多尺度语义特征点构建 3D 特征层的 3D 目标检测方法。

MS3D: A 3D object detection method using multi-scale semantic feature points to construct 3D feature layer.

机构信息

The School of Mechanical and Electrical Engineering, China Jiliang University, Hanzhou, China.

出版信息

Neural Netw. 2024 Nov;179:106623. doi: 10.1016/j.neunet.2024.106623. Epub 2024 Aug 10.

DOI:10.1016/j.neunet.2024.106623

Abstract

LiDAR point clouds can effectively depict the motion and posture of objects in three-dimensional space. Many studies accomplish the 3D object detection by voxelizing point clouds. However, in autonomous driving scenarios, the sparsity and hollowness of point clouds create some difficulties for voxel-based methods. The sparsity of point clouds makes it challenging to describe the geometric features of objects. The hollowness of point clouds poses difficulties for the aggregation of 3D features. We propose a two-stage 3D object detection framework, called MS3D. (1) We propose a method using voxel feature points from multi-branch to construct the 3D feature layer. Using voxel feature points from different branches, we construct a relatively compact 3D feature layer with rich semantic features. Additionally, we propose a distance-weighted sampling method, reducing the loss of foreground points caused by downsampling and allowing the 3D feature layer to retain more foreground points. (2) In response to the hollowness of point clouds, we predict the offsets between deep-level feature points and the object's centroid, making them as close as possible to the object's centroid. This enables the aggregation of these feature points with abundant semantic features. For feature points from shallow-level, we retain them on the object's surface to describe the geometric features of the object. To validate our approach, we evaluated its effectiveness on both the KITTI and ONCE datasets.

摘要

激光雷达点云可以有效地描述三维空间中物体的运动和姿态。许多研究通过体素化点云来完成 3D 目标检测。然而，在自动驾驶场景中，点云的稀疏性和空洞性给基于体素的方法带来了一些困难。点云的稀疏性使得描述物体的几何特征变得困难。点云的空洞性使得 3D 特征的聚合变得困难。我们提出了一个两阶段的 3D 目标检测框架，称为 MS3D。(1)我们提出了一种使用多分支体素特征点构建 3D 特征层的方法。使用来自不同分支的体素特征点，我们构建了一个具有丰富语义特征的相对紧凑的 3D 特征层。此外，我们提出了一种距离加权采样方法，减少了下采样导致的前景点的损失，并使 3D 特征层保留更多的前景点。(2)针对点云的空洞性，我们预测深层特征点与物体质心之间的偏移量，使它们尽可能接近物体质心。这使得这些具有丰富语义特征的特征点能够进行聚合。对于浅层的特征点，我们保留在物体表面，以描述物体的几何特征。为了验证我们的方法，我们在 KITTI 和 ONCE 数据集上评估了它的有效性。