Suppr超能文献

基于时空视图一致性的弱监督单目3D目标检测

Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency.

作者信息

Han Wencheng, Tao Runzhou, Ling Haibin, Shen Jianbing

出版信息

IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):84-98. doi: 10.1109/TPAMI.2024.3466915. Epub 2024 Dec 4.

Abstract

Monocular 3D object detection plays a crucial role In the field of self-driving cars, estimating the size and location of objects solely based on input images. However, a notable disparity exists between the training and inference of 3D object detectors. This discrepancy arises because during inference, monocular 3D detectors depend solely on images captured by cameras; while during training, these methods require 3D ground truths labeled on point cloud data, which is obtained using specialized devices like LiDAR. This discrepancy creates a break in the data loop, preventing the feedback data from production cars from being utilized to enhance the robustness of the detectors. To address this issue and establish a connection in the data loop, we present a weakly-supervised solution that trains monocular 3D object detectors solely using 2D labels, eliminating the requirement for 3D ground truths. Our approach considers two view consistency: spatial and temporal view consistency, which play a crucial role in regulating the prediction of 3D bounding boxes. Spatial view consistency is achieved by employing projection and multi-view consistency techniques to guide the optimization of the target's location and size. We leverage temporal viewpoint consistency to provide temporal multi-view image pairs, and we further introduce temporal movement consistency to tackle the challenge of dynamic scenes. With only 2D ground truths, our method achieves comparable performance to fully supervised methods. Additionally, our method can be employed as a pre-training method and achieves significant improvement when fine-tuned with a small proportion of fully supervised labels.

摘要

单目3D目标检测在自动驾驶汽车领域起着至关重要的作用,它仅基于输入图像来估计物体的大小和位置。然而,3D目标检测器的训练和推理之间存在显著差异。这种差异的出现是因为在推理过程中,单目3D检测器仅依赖于相机拍摄的图像;而在训练过程中,这些方法需要在点云数据上标注的3D地面真值,点云数据是使用激光雷达等专用设备获取的。这种差异导致数据循环出现中断,使得生产车辆的反馈数据无法用于增强检测器的鲁棒性。为了解决这个问题并在数据循环中建立联系,我们提出了一种弱监督解决方案,该方案仅使用2D标签来训练单目3D目标检测器,从而消除了对3D地面真值的需求。我们的方法考虑了两种视图一致性:空间和时间视图一致性,它们在调节3D边界框的预测中起着至关重要的作用。空间视图一致性是通过采用投影和多视图一致性技术来指导目标位置和大小的优化来实现的。我们利用时间视点一致性来提供时间多视图图像对,并进一步引入时间运动一致性来应对动态场景的挑战。仅使用2D地面真值,我们的方法就能达到与全监督方法相当的性能。此外,我们的方法可以用作预训练方法,并且在使用一小部分全监督标签进行微调时能取得显著的改进。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验