Shao Shuwei, Pei Zhongcai, Chen Weihai, Chen Peter C Y, Li Zhengguo
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8883-8899. doi: 10.1109/TPAMI.2024.3411571. Epub 2024 Nov 6.
Over the past few years, monocular depth estimation and completion have been paid more and more attention from the computer vision community because of their widespread applications. In this paper, we introduce novel physics (geometry)-driven deep learning frameworks for these two tasks by assuming that 3D scenes are constituted with piece-wise planes. Instead of directly estimating the depth map or completing the sparse depth map, we propose to estimate the surface normal and plane-to-origin distance maps or complete the sparse surface normal and distance maps as intermediate outputs. To this end, we develop a normal-distance head that outputs pixel-level surface normal and distance. Afterthat, the surface normal and distance maps are regularized by a developed plane-aware consistency constraint, which are then transformed into depth maps. Furthermore, we integrate an additional depth head to strengthen the robustness of the proposed frameworks. Extensive experiments on the NYU-Depth-v2, KITTI and SUN RGB-D datasets demonstrate that our method exceeds in performance prior state-of-the-art monocular depth estimation and completion competitors.
在过去几年中,单目深度估计与深度图补全因其广泛的应用而受到计算机视觉领域越来越多的关注。在本文中,我们通过假设三维场景由分段平面构成,为这两项任务引入了新颖的基于物理(几何)的深度学习框架。我们不是直接估计深度图或补全稀疏深度图,而是提议估计表面法线和平面到原点的距离图,或者补全稀疏表面法线和距离图作为中间输出。为此,我们开发了一个法线 - 距离头,用于输出像素级的表面法线和距离。在此之后,表面法线和距离图通过一个开发的平面感知一致性约束进行正则化,然后转换为深度图。此外,我们集成了一个额外的深度头以增强所提出框架的鲁棒性。在NYU - Depth - v2、KITTI和SUN RGB - D数据集上进行的大量实验表明,我们的方法在性能上超过了先前的单目深度估计与深度图补全的最先进竞争对手。