体素特征金字塔网络（Voxel-FPN）：用于从激光雷达点云进行三维目标检测的多尺度体素特征聚合

Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds.

作者信息

Kuang Hongwu, Wang Bei, An Jianping, Zhang Ming, Zhang Zehan

机构信息

Hangzhou Hikvision Digital Technology Co. Ltd, Hangzhou, China.

出版信息

Sensors (Basel). 2020 Jan 28;20(3):704. doi: 10.3390/s20030704.

DOI:10.3390/s20030704

PMID:32012863

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7038507/

Abstract

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

摘要

点云数据中的目标检测是计算机视觉系统的关键组成部分之一，特别是对于自动驾驶应用。在这项工作中，我们提出了体素特征金字塔网络（Voxel-Feature Pyramid Network），这是一种新颖的单阶段3D目标检测器，仅利用激光雷达传感器的原始数据。核心框架由一个编码器网络和一个相应的解码器以及一个区域提议网络组成。编码器以自下而上的方式提取并融合多尺度体素信息，而解码器则通过特征金字塔网络以自上而下的方式融合来自不同尺度的多个特征图。大量实验表明，所提出的方法在从点数据中提取特征方面具有更好的性能，并且在具有挑战性的KITTI-3D基准测试中展示了其相对于一些基线方法的优越性，在真实场景中的速度和准确性方面都取得了良好的性能。