Department of Mechanical Engineering and Automation, School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130022, China.
Research Center for Space Optical Engineering, Harbin Institute of Technology, P.O. Box 307, Harbin 150001, China.
Sensors (Basel). 2020 Aug 27;20(17):4856. doi: 10.3390/s20174856.
Accurately sensing the surrounding 3D scene is indispensable for drones or robots to execute path planning and navigation. In this paper, a novel monocular depth estimation method was proposed that primarily utilizes a lighter-weight Convolutional Neural Network (CNN) structure for coarse depth prediction and then refines the coarse depth images by combining surface normal guidance. Specifically, the coarse depth prediction network is designed as pre-trained encoder-decoder architecture for describing the 3D structure. When it comes to surface normal estimation, the deep learning network was designed as a two-stream encoder-decoder structure, which hierarchically merges red-green-blue-depth (RGB-D) images for capturing more accurate geometric boundaries. Relying on fewer network parameters and simpler learning structure, better detailed depth maps are produced than the existing states. Moreover, 3D point cloud maps reconstructed from depth prediction images confirm that our framework can be conveniently adopted as components of a monocular simultaneous localization and mapping (SLAM) paradigm.
准确感知周围的 3D 场景对于无人机或机器人执行路径规划和导航是必不可少的。在本文中,提出了一种新颖的单目深度估计方法,该方法主要利用更轻量级的卷积神经网络 (CNN) 结构进行粗深度预测,然后通过结合表面法线引导来细化粗深度图像。具体来说,粗深度预测网络被设计为用于描述 3D 结构的预训练编码器-解码器架构。在进行表面法线估计时,深度学习网络被设计为具有两个流编码器-解码器结构,该结构分层合并 RGB-D 图像以捕获更准确的几何边界。与现有方法相比,我们的方法使用更少的网络参数和更简单的学习结构,生成了更好的详细深度图。此外,从深度预测图像重建的 3D 点云图证实,我们的框架可以方便地作为单目同时定位和映射 (SLAM) 范例的组件。