College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China.
Zhejiang Provincial Key Laboratory of Information Processing, Communication and Networking, Zhejiang University, Hangzhou 310000, China.
Sensors (Basel). 2020 Jan 23;20(3):635. doi: 10.3390/s20030635.
As the core task of scene understanding, semantic segmentation and depth completion play a vital role in lots of applications such as robot navigation, AR/VR and autonomous driving. They are responsible for parsing scenes from the angle of semantics and geometry, respectively. While great progress has been made in both tasks through deep learning technologies, few works have been done on building a joint model by deeply exploring the inner relationship of the above tasks. In this paper, semantic segmentation and depth completion are jointly considered under a multi-task learning framework. By sharing a common encoder part and introducing boundary features as inner constraints in the decoder part, the two tasks can properly share the required information from each other. An extra boundary detection sub-task is responsible for providing the boundary features and constructing cross-task joint loss functions for network training. The entire network is implemented end-to-end and evaluated with both RGB and sparse depth input. Experiments conducted on synthesized and real scene datasets show that our proposed multi-task CNN model can effectively improve the performance of every single task.
作为场景理解的核心任务,语义分割和深度完成在机器人导航、AR/VR 和自动驾驶等许多应用中起着至关重要的作用。它们分别负责从语义和几何的角度解析场景。虽然深度学习技术在这两个任务上都取得了很大的进展,但很少有工作深入探讨上述任务的内在关系,从而构建联合模型。在本文中,我们在多任务学习框架下联合考虑语义分割和深度完成。通过共享一个公共的编码器部分,并在解码器部分引入边界特征作为内部约束,两个任务可以从彼此那里适当地共享所需的信息。一个额外的边界检测子任务负责提供边界特征,并为网络训练构建跨任务联合损失函数。整个网络是端到端实现的,使用 RGB 和稀疏深度输入进行评估。在合成和真实场景数据集上的实验表明,我们提出的多任务 CNN 模型可以有效地提高每个单独任务的性能。