IEEE Trans Image Process. 2023;32:2749-2760. doi: 10.1109/TIP.2023.3274479. Epub 2023 May 19.
Monocular 3D object detection has drawn increasing attention in various human-related applications, such as autonomous vehicles, due to its cost-effective property. On the other hand, a monocular image alone inherently contains insufficient information to infer the 3D information. In this paper, we propose a new monocular 3D object detector that can recall the stereoscopic visual information about an object, given a left-view monocular image. Here, we devise a location embedding module to handle each object by being aware of its location. Next, given the object appearance of the left-view monocular image, we devise Monocular-to-Stereoscopic (M2S) memory that can recall the object appearance of the right-view and depth information. For this purpose, we introduce a stereoscopic vision memorizing loss that guides the M2S memory to store the stereoscopic visual information. Furthermore, we propose a binocular vision association loss to guide the M2S memory that can associate the information of the left-right view about the object when estimating the depth. As a result, our monocular 3D object detector with the M2S memory can effectively exploit the recalled stereoscopic visual information in the inference phase. The comprehensive experimental results on two public datasets, KITTI 3D Object Detection Benchmark and Waymo Open Dataset, demonstrate the effectiveness of the proposed method. We claim that our method is a step-forward method that follows the behaviors of humans that can recall the stereoscopic visual information even when one eye is closed.
单目 3D 目标检测由于其成本效益,在自动驾驶等各种与人类相关的应用中引起了越来越多的关注。另一方面,由于单目图像本身内在地包含不足的信息来推断 3D 信息。在本文中,我们提出了一种新的单目 3D 目标检测器,它可以在给定左视图单目图像的情况下,回忆关于物体的立体视觉信息。在这里,我们设计了一个位置嵌入模块,通过感知物体的位置来处理每个物体。接下来,给定左视图单目图像的物体外观,我们设计了 Monocular-to-Stereoscopic (M2S) 记忆,它可以回忆右视图和深度信息的物体外观。为此,我们引入了立体视觉记忆损失,指导 M2S 记忆存储立体视觉信息。此外,我们提出了一种双目视觉关联损失,指导 M2S 记忆,当估计深度时,可以关联关于物体的左右视图的信息。结果,我们带有 M2S 记忆的单目 3D 目标检测器可以在推理阶段有效地利用回忆的立体视觉信息。在两个公共数据集 KITTI 3D Object Detection Benchmark 和 Waymo Open Dataset 上的综合实验结果证明了该方法的有效性。我们声称我们的方法是一种向前迈进的方法,它遵循人类的行为,即使一只眼睛闭上,也可以回忆起立体视觉信息。