基于金字塔三流式网络的单 RGB 图像深度预测。

Predicting Depth from Single RGB Images with Pyramidal Three-Streamed Networks.

机构信息

School of Technology, Beijing Forestry University, No. 35 Qinghua East Road, Haidian District, Beijing 10 0083, China.

Key Laboratory of State Forestry Administration on Forestry Equipment and Automation, No. 35 Qinghua East Road, Haidian District, Beijing 100083, China.

出版信息

Sensors (Basel). 2019 Feb 6;19(3):667. doi: 10.3390/s19030667.

DOI:10.3390/s19030667

PMID:30736347

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6386885/

Abstract

Predicting depth from a monocular image is an ill-posed and inherently ambiguous issue in computer vision. In this paper, we propose a pyramidal third-streamed network (PTSN) that recovers the depth information using a single given RGB image. PTSN uses pyramidal structure images, which can extract multiresolution features to improve the robustness of the network as the network input. The full connection layer is changed into fully convolutional layers with a new structure, which reduces the network parameters and computational complexity. We propose a new loss function including scale-invariant, horizontal and vertical gradient loss that not only helps predict the depth values, but also clearly obtains local contours. We evaluate PTSN on the NYU Depth v2 dataset and the experimental results show that our depth predictions have better accuracy than competing methods.

摘要

从单目图像预测深度是计算机视觉中一个病态和固有模糊的问题。在本文中，我们提出了一种金字塔式的第三流网络（PTSN），该网络使用单个给定的 RGB 图像来恢复深度信息。PTSN 使用金字塔结构的图像，可以提取多分辨率特征，以提高网络的鲁棒性作为网络输入。全连接层被改变成具有新结构的全卷积层，这减少了网络参数和计算复杂度。我们提出了一个新的损失函数，包括尺度不变、水平和垂直梯度损失，这不仅有助于预测深度值，而且还能清楚地得到局部轮廓。我们在 NYU Depth v2 数据集上评估了 PTSN，实验结果表明，我们的深度预测比竞争方法具有更高的准确性。