基于双层体素特征融合增强的城市道路交通场景下的3D目标检测

3D Object Detection under Urban Road Traffic Scenarios Based on Dual-Layer Voxel Features Fusion Augmentation.

作者信息

Jiang Haobin, Ren Junhao, Li Aoxue

机构信息

Automotive Engineering Research Institute, Jiangsu University, Zhenjiang 212013, China.

School of Automobile and Traffic Engineering, Jiangsu University, Zhenjiang 212013, China.

出版信息

Sensors (Basel). 2024 May 21;24(11):3267. doi: 10.3390/s24113267.

DOI:10.3390/s24113267

PMID:38894060

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11174701/

Abstract

To enhance the accuracy of detecting objects in front of intelligent vehicles in urban road scenarios, this paper proposes a dual-layer voxel feature fusion augmentation network (DL-VFFA). It aims to address the issue of objects misrecognition caused by local occlusion or limited field of view for targets. The network employs a point cloud voxelization architecture, utilizing the Mahalanobis distance to associate similar point clouds within neighborhood voxel units. It integrates local and global information through weight sharing to extract boundary point information within each voxel unit. The relative position encoding of voxel features is computed using an improved attention Gaussian deviation matrix in point cloud space to focus on the relative positions of different voxel sequences within channels. During the fusion of point cloud and image features, learnable weight parameters are designed to decouple fine-grained regions, enabling two-layer feature fusion from voxel to voxel and from point cloud to image. Extensive experiments on the KITTI dataset demonstrate the significant performance of DL-VFFA. Compared to the baseline network Second, DL-VFFA performs better in medium- and high-difficulty scenarios. Furthermore, compared to the voxel fusion module in MVX-Net, the voxel feature fusion results in this paper are more accurate, effectively capturing fine-grained object features post-voxelization. Through ablative experiments, we conducted in-depth analyses of the three voxel fusion modules in DL-VFFA to enhance the performance of the baseline detector and achieved superior results.

摘要

为提高智能车辆在城市道路场景中对前方物体检测的准确性，本文提出了一种双层体素特征融合增强网络（DL-VFFA）。其目的是解决因局部遮挡或目标视野受限导致的物体误识别问题。该网络采用点云体素化架构，利用马氏距离在邻域体素单元内关联相似点云。它通过权重共享整合局部和全局信息，以提取每个体素单元内的边界点信息。在点云空间中使用改进的注意力高斯偏差矩阵计算体素特征的相对位置编码，以关注通道内不同体素序列的相对位置。在点云与图像特征融合过程中，设计可学习的权重参数来解耦细粒度区域，实现从体素到体素以及从点云到图像的两层特征融合。在KITTI数据集上进行的大量实验证明了DL-VFFA的显著性能。与基线网络Second相比，DL-VFFA在中高难度场景下表现更好。此外，与MVX-Net中的体素融合模块相比，本文的体素特征融合结果更准确，能在体素化后有效捕捉细粒度物体特征。通过消融实验，我们对DL-VFFA中的三个体素融合模块进行了深入分析，以提高基线检测器的性能并取得了优异的结果。