Won Changhee, Ryu Jongbin, Lim Jongwoo
IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):3850-3862. doi: 10.1109/TPAMI.2020.2992497. Epub 2021 Oct 1.
In this paper, we propose a novel end-to-end deep neural network model for omnidirectional depth estimation from a wide-baseline multi-view stereo setup. The images captured with ultra-wide field-of-view cameras on an omnidirectional rig are processed by the feature extraction module, and then the deep feature maps are warped onto the concentric spheres swept through all candidate depths using the calibrated camera parameters. The 3D encoder-decoder block takes the aligned feature volume to produce an omnidirectional depth estimate with regularization on uncertain regions utilizing the global context information. For more accurate depth estimation we also propose an uncertainty prior guidance in two ways: depth map filtering and guiding regularization. In addition, we present large-scale synthetic datasets for training and testing omnidirectional multi-view stereo algorithms. Our datasets consist of 13K ground-truth depth maps and 53K fisheye images in four orthogonal directions with various objects and environments. Experimental results show that the proposed method generates excellent results in both synthetic and real-world environments, and it outperforms the prior art and the omnidirectional versions of the state-of-the-art conventional stereo algorithms.
在本文中,我们提出了一种新颖的端到端深度神经网络模型,用于从宽基线多视图立体设置中进行全向深度估计。在全向装置上用超广角视野相机捕获的图像由特征提取模块进行处理,然后利用校准后的相机参数将深度特征图扭曲到扫过所有候选深度的同心球体上。3D 编码器 - 解码器模块采用对齐的特征体,利用全局上下文信息对不确定区域进行正则化处理,以生成全向深度估计。为了进行更准确的深度估计,我们还通过两种方式提出了不确定性先验引导:深度图滤波和引导正则化。此外,我们还展示了用于训练和测试全向多视图立体算法的大规模合成数据集。我们的数据集由 13K 真实深度图和 53K 四个正交方向上带有各种物体和环境的鱼眼图像组成。实验结果表明,所提出的方法在合成环境和真实世界环境中均产生了出色的结果,并且优于现有技术以及最先进的传统立体算法的全向版本。