He Lei, Wang Guanghui, Hu Zhanyi
IEEE Trans Image Process. 2018 May 17. doi: 10.1109/TIP.2018.2832296.
Learning depth from a single image, as an important issue in scene understanding, has attracted a lot of attention in the past decade. The accuracy of the depth estimation has been improved from conditional Markov random fields, non-parametric methods, to deep convolutional neural networks most recently. However, there exist inherent ambiguities in recovering 3D from a single 2D image. In this paper, we first prove the ambiguity between the focal length and monocular depth learning, and verify the result using experiments, showing that the focal length has a great influence on accurate depth recovery. In order to learn monocular depth by embedding the focal length, we propose a method to generate synthetic varying-focal-length dataset from fixed-focal-length datasets, and a simple and effective method is implemented to fill the holes in the newly generated images. For the sake of accurate depth recovery, we propose a novel deep neural network to infer depth through effectively fusing the middle-level information on the fixed-focal-length dataset, which outperforms the state-of-the-art methods built on pretrained VGG. Furthermore, the newly generated varying-focallength dataset is taken as input to the proposed network in both learning and inference phases. Extensive experiments on the fixed- and varying-focal-length datasets demonstrate that the learned monocular depth with embedded focal length is significantly improved compared to that without embedding the focal length information.
从单张图像中学习深度作为场景理解中的一个重要问题,在过去十年中受到了广泛关注。深度估计的准确性已经从条件马尔可夫随机场、非参数方法,发展到最近的深度卷积神经网络。然而,从单张二维图像恢复三维存在固有的模糊性。在本文中,我们首先证明了焦距与单目深度学习之间的模糊性,并通过实验验证了结果,表明焦距对准确的深度恢复有很大影响。为了通过嵌入焦距来学习单目深度,我们提出了一种从固定焦距数据集生成合成变焦距数据集的方法,并实现了一种简单有效的方法来填补新生成图像中的空洞。为了实现准确的深度恢复,我们提出了一种新颖的深度神经网络,通过有效融合固定焦距数据集上的中级信息来推断深度,该方法优于基于预训练VGG构建的现有方法。此外,在学习和推理阶段,新生成的变焦距数据集都作为所提出网络的输入。在固定焦距和变焦距数据集上进行的大量实验表明,与未嵌入焦距信息的情况相比,嵌入焦距后学习到的单目深度有显著提高。