基于立体图像的 3D 对象建议，用于准确的目标类别检测。

3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2018 May;40(5):1259-1272. doi: 10.1109/TPAMI.2017.2706685. Epub 2017 May 19.

DOI:10.1109/TPAMI.2017.2706685

Abstract

The goal of this paper is to perform 3D object detection in the context of autonomous driving. Our method aims at generating a set of high-quality 3D object proposals by exploiting stereo imagery. We formulate the problem as minimizing an energy function that encodes object size priors, placement of objects on the ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. We then exploit a CNN on top of these proposals to perform object detection. In particular, we employ a convolutional neural net (CNN) that exploits context and depth information to jointly regress to 3D bounding box coordinates and object pose. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. When combined with the CNN, our approach outperforms all existing results in object detection and orientation estimation tasks for all three KITTI object classes. Furthermore, we experiment also with the setting where LIDAR information is available, and show that using both LIDAR and stereo leads to the best result.

摘要

本文的目标是在自动驾驶的背景下进行 3D 目标检测。我们的方法旨在通过利用立体图像生成一组高质量的 3D 目标建议。我们将问题公式化为最小化能量函数，该函数对物体大小先验、地面上物体的位置以及几个深度信息特征进行编码，这些特征用于推理自由空间、点云密度和与地面的距离。然后，我们在这些建议的基础上利用 CNN 进行目标检测。具体来说，我们采用了一种卷积神经网络（CNN），该网络利用上下文和深度信息来联合回归到 3D 边界框坐标和物体姿态。我们的实验在 KITTI 基准上的现有 RGB 和 RGB-D 目标建议方法上取得了显著的性能提升。当与 CNN 结合使用时，我们的方法在 KITTI 的所有三个目标类别中的目标检测和方向估计任务上都优于所有现有结果。此外，我们还在有激光雷达信息可用的情况下进行了实验，并表明同时使用激光雷达和立体图像可以得到最好的结果。