解耦单目 3D 目标检测：从单类识别到多类识别。

Disentangling Monocular 3D Object Detection: From Single to Multi-Class Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1219-1231. doi: 10.1109/TPAMI.2020.3025077. Epub 2022 Feb 3.

DOI:10.1109/TPAMI.2020.3025077

Abstract

In this paper we introduce a method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes. The proposed disentangling transformation isolates the contribution made by different groups of parameters to a given loss, without changing its nature. This brings two advantages: i) it simplifies the training dynamics in the presence of losses with complex interactions of parameters; and ii) it allows us to avoid the issue of balancing independent regression terms. We further apply this disentangling transformation to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. We also critically review the AP metric used in KITTI3D and resolve a flaw which affected and biased all previously published results on monocular 3D detection. Our improved metric is now used as official KITTI3D metric. We provide extensive experimental evaluations and ablation studies on the KITTI3D and nuScenes datasets, setting new state-of-the-art results. We provide additional results on all the classes of KITTI3D as well as nuScenes datasets to further validate the robustness of our method, demonstrating its ability to generalize for different types of objects.

摘要

本文提出了一种从单张 RGB 图像中进行多类别、单目 3D 目标检测的方法，该方法利用了一种新颖的解缠变换和一种新颖的、自监督的 3D 边界框预测置信度估计方法。所提出的解缠变换将不同组参数对给定损失的贡献隔离开来，而不改变其性质。这带来了两个优点：i）它简化了存在参数复杂相互作用的损失情况下的训练动态；ii）它允许我们避免平衡独立回归项的问题。我们进一步将这种解缠变换应用于另一种新颖的、带符号的交并比（IoU）驱动的损失，以提高 2D 检测结果。我们还对 KITTI3D 中使用的 AP 指标进行了批判性审查，并解决了一个影响和偏向所有以前发布的单目 3D 检测结果的缺陷。我们改进后的指标现在被用作官方的 KITTI3D 指标。我们在 KITTI3D 和 nuScenes 数据集上进行了广泛的实验评估和消融研究，取得了新的最先进的结果。我们还提供了 KITTI3D 和 nuScenes 数据集的所有类别的额外结果，以进一步验证我们方法的鲁棒性，展示了其对不同类型物体的泛化能力。

相似文献

Disentangling Monocular 3D Object Detection: From Single to Multi-Class Recognition.解耦单目 3D 目标检测：从单类识别到多类识别。

IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1219-1231. doi: 10.1109/TPAMI.2020.3025077. Epub 2022 Feb 3.

MonoGRNet: A General Framework for Monocular 3D Object Detection.MonoGRNet：单目3D目标检测的通用框架

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5170-5184. doi: 10.1109/TPAMI.2021.3074363. Epub 2022 Aug 4.

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection.OBMO：用于单目3D目标检测的一个边界框多个目标

IEEE Trans Image Process. 2023 Nov 21;PP. doi: 10.1109/TIP.2023.3333225.

Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency.基于时空视图一致性的弱监督单目3D目标检测

IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):84-98. doi: 10.1109/TPAMI.2024.3466915. Epub 2024 Dec 4.

MDS-Net: Multi-Scale Depth Stratification 3D Object Detection from Monocular Images.MDS-Net：基于单目图像的多尺度深度分层 3D 目标检测

Sensors (Basel). 2022 Aug 18;22(16):6197. doi: 10.3390/s22166197.

Vertex points are not enough: Monocular 3D object detection via intra- and inter-plane constraints.顶点不够：通过平面内和平面间约束进行单目 3D 目标检测。

Neural Netw. 2023 May;162:350-358. doi: 10.1016/j.neunet.2023.02.038. Epub 2023 Mar 2.

Accurate 3D to 2D Object Distance Estimation from the Mapped Point Cloud Data.从映射点云数据中准确估计 3D 到 2D 的物体距离。

Sensors (Basel). 2023 Feb 13;23(4):2103. doi: 10.3390/s23042103.

MonoFENet: Monocular 3D Object Detection with Feature Enhancement Networks.单目特征增强网络的单目3D目标检测（MonoFENet）

IEEE Trans Image Process. 2019 Nov 13. doi: 10.1109/TIP.2019.2952201.

Graph-DETR4D: Spatio-Temporal Graph Modeling for Multi-View 3D Object Detection.Graph-DETR4D：用于多视图3D目标检测的时空图建模

IEEE Trans Image Process. 2024;33:4488-4500. doi: 10.1109/TIP.2024.3430473. Epub 2024 Aug 21.

Monocular Quasi-Dense 3D Object Tracking.单目准密集三维物体跟踪

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1992-2008. doi: 10.1109/TPAMI.2022.3168781. Epub 2023 Jan 6.

引用本文的文献

Point-Cloud Instance Segmentation for Spinning Laser Sensors.旋转激光传感器的点云实例分割

J Imaging. 2024 Dec 17;10(12):325. doi: 10.3390/jimaging10120325.

Cross-Modal Graph Semantic Communication Assisted by Generative AI in the Metaverse for 6G.生成式人工智能助力元宇宙中6G的跨模态图语义通信。

Research (Wash D C). 2024 Apr 29;7:0342. doi: 10.34133/research.0342. eCollection 2024.

A survey on 3D object detection in real time for autonomous driving.一项关于自动驾驶实时三维目标检测的调查。

Front Robot AI. 2024 Mar 6;11:1212070. doi: 10.3389/frobt.2024.1212070. eCollection 2024.

Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection.基于注意力机制的多帧点云特征融合用于3D目标检测

Sensors (Basel). 2022 Oct 2;22(19):7473. doi: 10.3390/s22197473.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

解耦单目 3D 目标检测：从单类识别到多类识别。

Disentangling Monocular 3D Object Detection: From Single to Multi-Class Recognition.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献