Zhang Anmei, Sun Jian
IEEE Trans Image Process. 2021;30:3419-3433. doi: 10.1109/TIP.2021.3061901. Epub 2021 Mar 9.
Estimating depth and defocus maps are two fundamental tasks in computer vision. Recently, many methods explore these two tasks separately with the help of the powerful feature learning ability of deep learning and these methods have achieved impressive progress. However, due to the difficulty in densely labeling depth and defocus on real images, these methods are mostly based on synthetic training dataset, and the performance of learned network degrades significantly on real images. In this paper, we tackle a new task that jointly estimates depth and defocus from a single image. We design a dual network with two subnets respectively for estimating depth and defocus. The network is jointly trained on synthetic dataset with a physical constraint to enforce the physical consistency between depth and defocus. Moreover, we design a simple method to label depth and defocus order on real image dataset, and design two novel metrics to measure accuracies of depth and defocus estimation on real images. Comprehensive experiments demonstrate that joint training for depth and defocus estimation using physical consistency constraint enables these two subnets to guide each other, and effectively improves their depth and defocus estimation performance on real defocused image dataset.
估计深度图和散焦图是计算机视觉中的两项基本任务。最近,许多方法借助深度学习强大的特征学习能力分别探索这两项任务,并且这些方法已经取得了令人瞩目的进展。然而,由于在真实图像上密集标注深度和散焦存在困难,这些方法大多基于合成训练数据集,并且所学网络在真实图像上的性能会显著下降。在本文中,我们处理一项新任务,即从单张图像中联合估计深度和散焦。我们设计了一个双网络,分别有两个子网用于估计深度和散焦。该网络在合成数据集上进行联合训练,并带有一个物理约束,以强制深度和散焦之间的物理一致性。此外,我们设计了一种简单的方法来在真实图像数据集上标注深度和散焦顺序,并设计了两个新颖的指标来衡量在真实图像上深度和散焦估计的准确性。综合实验表明,使用物理一致性约束对深度和散焦估计进行联合训练能使这两个子网相互引导,并有效提高它们在真实散焦图像数据集上的深度和散焦估计性能。