IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1219-1231. doi: 10.1109/TPAMI.2020.3025077. Epub 2022 Feb 3.
In this paper we introduce a method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes. The proposed disentangling transformation isolates the contribution made by different groups of parameters to a given loss, without changing its nature. This brings two advantages: i) it simplifies the training dynamics in the presence of losses with complex interactions of parameters; and ii) it allows us to avoid the issue of balancing independent regression terms. We further apply this disentangling transformation to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. We also critically review the AP metric used in KITTI3D and resolve a flaw which affected and biased all previously published results on monocular 3D detection. Our improved metric is now used as official KITTI3D metric. We provide extensive experimental evaluations and ablation studies on the KITTI3D and nuScenes datasets, setting new state-of-the-art results. We provide additional results on all the classes of KITTI3D as well as nuScenes datasets to further validate the robustness of our method, demonstrating its ability to generalize for different types of objects.
本文提出了一种从单张 RGB 图像中进行多类别、单目 3D 目标检测的方法,该方法利用了一种新颖的解缠变换和一种新颖的、自监督的 3D 边界框预测置信度估计方法。所提出的解缠变换将不同组参数对给定损失的贡献隔离开来,而不改变其性质。这带来了两个优点:i)它简化了存在参数复杂相互作用的损失情况下的训练动态;ii)它允许我们避免平衡独立回归项的问题。我们进一步将这种解缠变换应用于另一种新颖的、带符号的交并比(IoU)驱动的损失,以提高 2D 检测结果。我们还对 KITTI3D 中使用的 AP 指标进行了批判性审查,并解决了一个影响和偏向所有以前发布的单目 3D 检测结果的缺陷。我们改进后的指标现在被用作官方的 KITTI3D 指标。我们在 KITTI3D 和 nuScenes 数据集上进行了广泛的实验评估和消融研究,取得了新的最先进的结果。我们还提供了 KITTI3D 和 nuScenes 数据集的所有类别的额外结果,以进一步验证我们方法的鲁棒性,展示了其对不同类型物体的泛化能力。