Xie Zexiao, Yu Xiaoxuan, Gao Xiang, Li Kunqian, Shen Shuhan
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3395-3415. doi: 10.1109/TNNLS.2022.3201534. Epub 2024 Feb 29.
Depth completion aims to recover pixelwise depth from incomplete and noisy depth measurements with or without the guidance of a reference RGB image. This task attracted considerable research interest due to its importance in various computer vision-based applications, such as scene understanding, autonomous driving, 3-D reconstruction, object detection, pose estimation, trajectory prediction, and so on. As the system input, an incomplete depth map is usually generated by projecting the 3-D points collected by ranging sensors, such as LiDAR in outdoor environments, or obtained directly from RGB-D cameras in indoor areas. However, even if a high-end LiDAR is employed, the obtained depth maps are still very sparse and noisy, especially in the regions near the object boundaries, which makes the depth completion task a challenging problem. To address this issue, a few years ago, conventional image processing-based techniques were employed to fill the holes and remove the noise from the relatively dense depth maps obtained by RGB-D cameras, while deep learning-based methods have recently become increasingly popular and inspiring results have been achieved, especially for the challenging situation of LiDAR-image-based depth completion. This article systematically reviews and summarizes the works related to the topic of depth completion in terms of input modalities, data fusion strategies, loss functions, and experimental settings, especially for the key techniques proposed in deep learning-based multiple input methods. On this basis, we conclude by presenting the current status of depth completion and discussing several prospects for its future research directions.
深度补全旨在从不完整且有噪声的深度测量中恢复逐像素深度,无论有无参考RGB图像的引导。由于其在各种基于计算机视觉的应用中具有重要性,例如场景理解、自动驾驶、三维重建、目标检测、姿态估计、轨迹预测等,该任务引起了相当大的研究兴趣。作为系统输入,不完整的深度图通常是通过对距离传感器收集的三维点进行投影生成的,如室外环境中的激光雷达,或者直接从室内区域的RGB-D相机获得。然而,即使使用高端激光雷达,所获得的深度图仍然非常稀疏且有噪声,尤其是在物体边界附近的区域,这使得深度补全任务成为一个具有挑战性的问题。为了解决这个问题,几年前,基于传统图像处理的技术被用于填充孔洞并从RGB-D相机获得的相对密集的深度图中去除噪声,而基于深度学习的方法最近越来越受欢迎,并取得了鼓舞人心的成果,特别是对于基于激光雷达-图像的深度补全这一具有挑战性的情况。本文从输入模态、数据融合策略、损失函数和实验设置等方面系统地回顾和总结了与深度补全主题相关的工作,特别是基于深度学习的多输入方法中提出的关键技术。在此基础上,我们通过介绍深度补全的现状并讨论其未来研究方向的几个前景来得出结论。