Suppr超能文献

立体视觉回忆记忆进行单目 3D 目标检测。

Stereoscopic Vision Recalling Memory for Monocular 3D Object Detection.

出版信息

IEEE Trans Image Process. 2023;32:2749-2760. doi: 10.1109/TIP.2023.3274479. Epub 2023 May 19.

Abstract

Monocular 3D object detection has drawn increasing attention in various human-related applications, such as autonomous vehicles, due to its cost-effective property. On the other hand, a monocular image alone inherently contains insufficient information to infer the 3D information. In this paper, we propose a new monocular 3D object detector that can recall the stereoscopic visual information about an object, given a left-view monocular image. Here, we devise a location embedding module to handle each object by being aware of its location. Next, given the object appearance of the left-view monocular image, we devise Monocular-to-Stereoscopic (M2S) memory that can recall the object appearance of the right-view and depth information. For this purpose, we introduce a stereoscopic vision memorizing loss that guides the M2S memory to store the stereoscopic visual information. Furthermore, we propose a binocular vision association loss to guide the M2S memory that can associate the information of the left-right view about the object when estimating the depth. As a result, our monocular 3D object detector with the M2S memory can effectively exploit the recalled stereoscopic visual information in the inference phase. The comprehensive experimental results on two public datasets, KITTI 3D Object Detection Benchmark and Waymo Open Dataset, demonstrate the effectiveness of the proposed method. We claim that our method is a step-forward method that follows the behaviors of humans that can recall the stereoscopic visual information even when one eye is closed.

摘要

单目 3D 目标检测由于其成本效益,在自动驾驶等各种与人类相关的应用中引起了越来越多的关注。另一方面,由于单目图像本身内在地包含不足的信息来推断 3D 信息。在本文中,我们提出了一种新的单目 3D 目标检测器,它可以在给定左视图单目图像的情况下,回忆关于物体的立体视觉信息。在这里,我们设计了一个位置嵌入模块,通过感知物体的位置来处理每个物体。接下来,给定左视图单目图像的物体外观,我们设计了 Monocular-to-Stereoscopic (M2S) 记忆,它可以回忆右视图和深度信息的物体外观。为此,我们引入了立体视觉记忆损失,指导 M2S 记忆存储立体视觉信息。此外,我们提出了一种双目视觉关联损失,指导 M2S 记忆,当估计深度时,可以关联关于物体的左右视图的信息。结果,我们带有 M2S 记忆的单目 3D 目标检测器可以在推理阶段有效地利用回忆的立体视觉信息。在两个公共数据集 KITTI 3D Object Detection Benchmark 和 Waymo Open Dataset 上的综合实验结果证明了该方法的有效性。我们声称我们的方法是一种向前迈进的方法,它遵循人类的行为,即使一只眼睛闭上,也可以回忆起立体视觉信息。

相似文献

1
Stereoscopic Vision Recalling Memory for Monocular 3D Object Detection.
IEEE Trans Image Process. 2023;32:2749-2760. doi: 10.1109/TIP.2023.3274479. Epub 2023 May 19.
2
MonoAux: Fully Exploiting Auxiliary Information and Uncertainty for Monocular 3D Object Detection.
Cyborg Bionic Syst. 2024 Mar 27;5:0097. doi: 10.34133/cbsystems.0097. eCollection 2024.
3
Monocular Quasi-Dense 3D Object Tracking.
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1992-2008. doi: 10.1109/TPAMI.2022.3168781. Epub 2023 Jan 6.
4
MDS-Net: Multi-Scale Depth Stratification 3D Object Detection from Monocular Images.
Sensors (Basel). 2022 Aug 18;22(16):6197. doi: 10.3390/s22166197.
5
GAC3D: improving monocular 3D object detection with ground-guide model and adaptive convolution.
PeerJ Comput Sci. 2021 Oct 6;7:e686. doi: 10.7717/peerj-cs.686. eCollection 2021.
6
Evaluation of the monocular depth cue in 3D displays.
Opt Express. 2008 Dec 22;16(26):21415-22. doi: 10.1364/oe.16.021415.
7
MonoFENet: Monocular 3D Object Detection with Feature Enhancement Networks.
IEEE Trans Image Process. 2019 Nov 13. doi: 10.1109/TIP.2019.2952201.
8
Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory.
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):341-353. doi: 10.1109/TNNLS.2023.3323560. Epub 2025 Jan 7.
9
MonoDCN: Monocular 3D object detection based on dynamic convolution.
PLoS One. 2022 Oct 4;17(10):e0275438. doi: 10.1371/journal.pone.0275438. eCollection 2022.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验