Ji Rongrong, Li Ke, Wang Yan, Sun Xiaoshuai, Guo Feng, Guo Xiaowei, Wu Yongjian, Huang Feiyue, Luo Jiebo
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2410-2422. doi: 10.1109/TPAMI.2019.2936024. Epub 2019 Aug 20.
In this paper, we address the problem of monocular depth estimation when only a limited number of training image-depth pairs are available. To achieve a high regression accuracy, the state-of-the-art estimation methods rely on CNNs trained with a large number of image-depth pairs, which are prohibitively costly or even infeasible to acquire. Aiming to break the curse of such expensive data collections, we propose a semi-supervised adversarial learning framework that only utilizes a small number of image-depth pairs in conjunction with a large number of easily-available monocular images to achieve high performance. In particular, we use one generator to regress the depth and two discriminators to evaluate the predicted depth, i.e., one inspects the image-depth pair while the other inspects the depth channel alone. These two discriminators provide their feedbacks to the generator as the loss to generate more realistic and accurate depth predictions. Experiments show that the proposed approach can (1) improve most state-of-the-art models on the NYUD v2 dataset by effectively leveraging additional unlabeled data sources; (2) reach state-of-the-art accuracy when the training set is small, e.g., on the Make3D dataset; (3) adapt well to an unseen new dataset (Make3D in our case) after training on an annotated dataset (KITTI in our case).
在本文中,我们探讨了在仅有有限数量的训练图像-深度对可用时的单目深度估计问题。为了实现较高的回归精度,当前最先进的估计方法依赖于使用大量图像-深度对训练的卷积神经网络(CNNs),而获取这些图像-深度对的成本过高甚至不可行。旨在打破这种昂贵数据收集的限制,我们提出了一种半监督对抗学习框架,该框架仅利用少量图像-深度对并结合大量易于获取的单目图像来实现高性能。具体而言,我们使用一个生成器来回归深度,并使用两个判别器来评估预测的深度,即一个判别器检查图像-深度对,而另一个判别器仅检查深度通道。这两个判别器将它们的反馈作为损失提供给生成器,以生成更真实、准确的深度预测。实验表明,所提出的方法能够:(1)通过有效利用额外的未标记数据源,在NYUD v2数据集上改进大多数当前最先进的模型;(2)在训练集较小时,例如在Make3D数据集上达到当前最先进的精度;(3)在一个带注释的数据集(在我们的例子中是KITTI)上训练后,能很好地适应一个未见过的新数据集(在我们的例子中是Make3D)。