Feng Mingtao, Wang Yaonan, Liu Jian, Zhang Liang, Zaki Hasan F M, Mian Ajmal
IEEE Trans Image Process. 2018 Jul;27(7):3586-3598. doi: 10.1109/TIP.2018.2814217. Epub 2018 Mar 9.
Convolutional Neural Networks (CNN) have performed extremely well for many image analysis tasks. However, supervised training of deep CNN architectures requires huge amounts of labelled data which is unavailable for light field images. In this paper, we leverage on synthetic light field images and propose a two stream CNN network that learns to estimate the disparities of multiple correlated neighbourhood pixels from their Epipolar Plane Images (EPI). Since the EPIs are unrelated except at their intersection, a two stream network is proposed to learn convolution weights individually for the EPIs and then combine the outputs of the two streams for disparity estimation. The CNN estimated disparity map is then refined using the central RGB light field image as a prior in a variational technique. We also propose a new real world dataset comprising light field images of 19 objects captured with the Lytro Illum camera in outdoor scenes and their corresponding 3D pointclouds, as ground truth, captured with the 3dMD scanner. This dataset will be made public to allow more precise 3D pointcloud level comparison of algorithms in the future which is currently not possible. Experiments on the synthetic and real world datasets show that our algorithm outperforms existing state-of-the-art for depth estimation from light field images.
卷积神经网络(CNN)在许多图像分析任务中表现极为出色。然而,深度CNN架构的监督训练需要大量的标注数据,而光场图像并不具备这些数据。在本文中,我们利用合成光场图像,提出了一种双流CNN网络,该网络能够从其极平面图像(EPI)中学习估计多个相关邻域像素的视差。由于除了在它们的交点处,EPI之间是不相关的,因此提出了一种双流网络,以便分别为EPI学习卷积权重,然后将两个流的输出组合起来进行视差估计。然后,在变分技术中,以中央RGB光场图像作为先验,对CNN估计的视差图进行细化。我们还提出了一个新的真实世界数据集,该数据集包含使用Lytro Illum相机在室外场景中拍摄的19个物体的光场图像,以及使用3dMD扫描仪作为地面真值捕获的相应3D点云。该数据集将公开,以便未来能够对算法进行更精确的3D点云级比较,而目前这是无法做到的。在合成数据集和真实世界数据集上的实验表明,我们的算法在从光场图像进行深度估计方面优于现有的最先进算法。