AMFF-Net：一种基于注意力和多尺度特征融合的有效3D目标检测器。

AMFF-Net: An Effective 3D Object Detector Based on Attention and Multi-Scale Feature Fusion.

作者信息

Li Guangping, Mo Zuanfang, Ling Bingo Wing-Kuen

机构信息

School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China.

出版信息

Sensors (Basel). 2023 Nov 22;23(23):9319. doi: 10.3390/s23239319.

DOI:10.3390/s23239319

PMID:38067692

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10708759/

Abstract

With the advent of autonomous vehicle applications, the importance of LiDAR point cloud 3D object detection cannot be overstated. Recent studies have demonstrated that methods for aggregating features from voxels can accurately and efficiently detect objects in large, complex 3D detection scenes. Nevertheless, most of these methods do not filter background points well and have inferior detection performance for small objects. To ameliorate this issue, this paper proposes an Attention-based and Multiscale Feature Fusion Network (AMFF-Net), which utilizes a Dual-Attention Voxel Feature Extractor (DA-VFE) and a Multi-scale Feature Fusion (MFF) Module to improve the precision and efficiency of 3D object detection. The DA-VFE considers pointwise and channelwise attention and integrates them into the Voxel Feature Extractor (VFE) to enhance key point cloud information in voxels and refine more-representative voxel features. The MFF Module consists of self-calibrated convolutions, a residual structure, and a coordinate attention mechanism, which acts as a 2D Backbone to expand the receptive domain and capture more contextual information, thus better capturing small object locations, enhancing the feature-extraction capability of the network and reducing the computational overhead. We performed evaluations of the proposed model on the nuScenes dataset with a large number of driving scenarios. The experimental results showed that the AMFF-Net achieved 62.8% in the mAP, which significantly boosted the performance of small object detection compared to the baseline network and significantly reduced the computational overhead, while the inference speed remained essentially the same. AMFF-Net also achieved advanced performance on the KITTI dataset.

摘要

随着自动驾驶车辆应用的出现，激光雷达点云三维目标检测的重要性怎么强调都不为过。最近的研究表明，从体素聚合特征的方法能够在大型复杂的三维检测场景中准确且高效地检测目标。然而，这些方法中的大多数不能很好地过滤背景点，并且对小目标的检测性能较差。为了改善这个问题，本文提出了一种基于注意力和多尺度特征融合网络（AMFF-Net），它利用双注意力体素特征提取器（DA-VFE）和多尺度特征融合（MFF）模块来提高三维目标检测的精度和效率。DA-VFE考虑逐点和通道注意力，并将它们集成到体素特征提取器（VFE）中，以增强体素中的关键点云信息并细化更具代表性的体素特征。MFF模块由自校准卷积、残差结构和坐标注意力机制组成，它作为二维主干来扩展感受野并捕获更多上下文信息，从而更好地捕捉小目标位置，增强网络的特征提取能力并减少计算开销。我们在具有大量驾驶场景的nuScenes数据集上对所提出的模型进行了评估。实验结果表明，AMFF-Net在平均精度均值（mAP）上达到了62.8%，与基线网络相比，显著提高了小目标检测的性能，并且显著降低了计算开销，而推理速度基本保持不变。AMFF-Net在KITTI数据集上也取得了先进的性能。