IEEE Trans Image Process. 2017 Feb;26(2):836-846. doi: 10.1109/TIP.2016.2621673. Epub 2016 Oct 26.
Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision, including object detection and semantic segmentation. Although depth sensors such as the Microsoft Kinect have facilitated easy acquisition of such depth information, the vast majority of images used in vision tasks do not contain depth information. In this paper, we show that augmenting RGB images with estimated depth can also improve the accuracy of both object detection and semantic segmentation. Specifically, we first exploit the recent success of depth estimation from monocular images and learn a deep depth estimation model. Then, we learn deep depth features from the estimated depth and combine with RGB features for object detection and semantic segmentation. In addition, we propose an RGB-D semantic segmentation method, which applies a multi-task training scheme: semantic label prediction and depth value regression. We test our methods on several data sets and demonstrate that incorporating information from estimated depth improves the performance of object detection and semantic segmentation remarkably.
事实证明,用测量得到的深度信息增强RGB数据可以提高计算机视觉中一系列任务的性能,包括目标检测和语义分割。尽管像微软Kinect这样的深度传感器使得此类深度信息的获取变得容易,但视觉任务中使用的绝大多数图像并不包含深度信息。在本文中,我们表明用估计深度增强RGB图像也可以提高目标检测和语义分割的准确率。具体而言,我们首先利用单目图像深度估计的最新成果,学习一个深度深度估计模型。然后,我们从估计深度中学习深度特征,并与RGB特征相结合用于目标检测和语义分割。此外,我们提出了一种RGB-D语义分割方法,该方法应用了一种多任务训练方案:语义标签预测和深度值回归。我们在多个数据集上测试了我们的方法,并证明合并来自估计深度的信息能显著提高目标检测和语义分割的性能。