基于深度学习的具有深度信息细化的单目 3D 目标检测。

Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information.

机构信息

Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China.

University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Sensors (Basel). 2022 Mar 28;22(7):2576. doi: 10.3390/s22072576.

DOI:10.3390/s22072576

PMID:35408191

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9003335/

Abstract

Recently, the research on monocular 3D target detection based on pseudo-LiDAR data has made some progress. In contrast to LiDAR-based algorithms, the robustness of pseudo-LiDAR methods is still inferior. After conducting in-depth experiments, we realized that the main limitations are due to the inaccuracy of the target position and the uncertainty in the depth distribution of the foreground target. These two problems arise from the inaccurate depth estimation. To deal with the aforementioned problems, we propose two innovative solutions. The first is a novel method based on joint image segmentation and geometric constraints, used to predict the target depth and provide the depth prediction confidence measure. The predicted target depth is fused with the overall depth of the scene and results in the optimal target position. For the second, we utilize the target scale, normalized with the Gaussian function, as a priori information. The uncertainty of depth distribution, which can be visualized as long-tail noise, is reduced. With the refined depth information, we convert the optimized depth map into the point cloud representation, called a pseudo-LiDAR point cloud. Finally, we input the pseudo-LiDAR point cloud to the LiDAR-based algorithm to detect the 3D target. We conducted extensive experiments on the challenging KITTI dataset. The results demonstrate that our proposed framework outperforms various state-of-the-art methods by more than 12.37% and 5.34% on the easy and hard settings of the KITTI validation subset, respectively. On the KITTI test set, our framework also outperformed state-of-the-art methods by 5.1% and 1.76% on the easy and hard settings, respectively.

摘要

最近，基于伪激光雷达数据的单目 3D 目标检测研究取得了一些进展。与基于激光雷达的算法相比，伪激光雷达方法的鲁棒性仍然较差。经过深入实验，我们意识到主要的限制是由于目标位置的不准确性和前景目标深度分布的不确定性。这两个问题源于深度估计的不准确性。为了解决上述问题，我们提出了两种创新的解决方案。第一个是一种基于联合图像分割和几何约束的新方法，用于预测目标深度并提供深度预测置信度度量。预测的目标深度与场景的整体深度融合，得出最优的目标位置。对于第二个问题，我们利用目标尺度，归一化为高斯函数，作为先验信息。深度分布的不确定性，可视为长尾噪声，得到降低。利用细化后的深度信息，我们将优化后的深度图转换为点云表示，称为伪激光雷达点云。最后，我们将伪激光雷达点云输入基于激光雷达的算法中，以检测 3D 目标。我们在具有挑战性的 KITTI 数据集上进行了广泛的实验。结果表明，我们提出的框架在 KITTI 验证子集的简单和困难设置上分别比各种最先进的方法高出 12.37%和 5.34%。在 KITTI 测试集上，我们的框架在简单和困难设置上也分别比最先进的方法高出 5.1%和 1.76%。