基于迁移学习的点云三维目标检测语义分割。

Transfer Learning Based Semantic Segmentation for 3D Object Detection from Point Cloud.

机构信息

Center for Artificial Intelligence & Autonomous Systems, Kunsan National University, 558 Daehak-ro, Naun 2(i)-dong, Gunsan 54150, Korea.

School of Mechanical Design Engineering, Smart e-Mobilty Lab, Center for Artificial Intelligence & Autonomous Systems, Jeonbuk National University, 567, Baekje-daero, Deokjin-gu, Jeonju-si 54896, Korea.

出版信息

Sensors (Basel). 2021 Jun 8;21(12):3964. doi: 10.3390/s21123964.

DOI:10.3390/s21123964

PMID:34201390

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8230345/

Abstract

Three-dimensional object detection utilizing LiDAR point cloud data is an indispensable part of autonomous driving perception systems. Point cloud-based 3D object detection has been a better replacement for higher accuracy than cameras during nighttime. However, most LiDAR-based 3D object methods work in a supervised manner, which means their state-of-the-art performance relies heavily on a large-scale and well-labeled dataset, while these annotated datasets could be expensive to obtain and only accessible in the limited scenario. Transfer learning is a promising approach to reduce the large-scale training datasets requirement, but existing transfer learning object detectors are primarily for 2D object detection rather than 3D. In this work, we utilize the 3D point cloud data more effectively by representing the birds-eye-view (BEV) scene and propose a transfer learning based point cloud semantic segmentation for 3D object detection. The proposed model minimizes the need for large-scale training datasets and consequently reduces the training time. First, a preprocessing stage filters the raw point cloud data to a BEV map within a specific field of view. Second, the transfer learning stage uses knowledge from the previously learned classification task (with more data for training) and generalizes the semantic segmentation-based 2D object detection task. Finally, 2D detection results from the BEV image have been back-projected into 3D in the postprocessing stage. We verify results on two datasets: the KITTI 3D object detection dataset and the Ouster LiDAR-64 dataset, thus demonstrating that the proposed method is highly competitive in terms of mean average precision (mAP up to 70%) while still running at more than 30 frames per second (FPS).

摘要

利用激光雷达点云数据进行三维目标检测是自动驾驶感知系统不可或缺的一部分。基于点云的 3D 目标检测在夜间比相机具有更高的精度，已经成为更好的替代品。然而，大多数基于激光雷达的 3D 目标方法都是基于监督的，这意味着它们的最新性能严重依赖于大规模的、标记良好的数据集，而这些标注数据集的获取成本很高，并且只能在有限的场景中使用。迁移学习是一种减少大规模训练数据集需求的很有前途的方法，但现有的迁移学习目标检测方法主要用于 2D 目标检测，而不是 3D。在这项工作中，我们通过表示鸟瞰图（BEV）场景更有效地利用 3D 点云数据，并提出了一种基于迁移学习的点云语义分割用于 3D 目标检测。所提出的模型最大限度地减少了对大规模训练数据集的需求，从而减少了训练时间。首先，预处理阶段将原始点云数据过滤到特定视场范围内的 BEV 地图中。其次，迁移学习阶段利用之前学习的分类任务（有更多数据用于训练）的知识，并将基于语义分割的 2D 目标检测任务推广。最后，在后处理阶段，将来自 BEV 图像的 2D 检测结果反向投影到 3D 中。我们在两个数据集上验证了结果：KITTI 3D 目标检测数据集和 Ouster LiDAR-64 数据集，从而证明了该方法在平均精度（mAP 高达 70%）方面具有很高的竞争力，同时仍能以每秒 30 帧以上的速度运行。