Lin Xiao, Sánchez-Escobedo Dalila, Casas Josep R, Pardàs Montse
Visual Interactions and Communication Technologies (Vicomtech), 20009 Donostia/San Sebastián, Spain.
Image Processing Group, TSC Department, Technical University of Catalonia (UPC), 08034 Barcelona, Spain.
Sensors (Basel). 2019 Apr 15;19(8):1795. doi: 10.3390/s19081795.
Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.
语义分割和深度估计是计算机视觉中的两项重要任务,并且已经开发了许多方法来处理它们。通常这两项任务是独立处理的,但最近在整合两个高度相关的任务可能会相互受益以提高估计精度的假设下,将这两个问题合并到一个单独框架中的想法得到了研究。在本文中,深度估计和语义分割在统一的卷积神经网络下使用单个RGB输入图像进行联合处理。我们分析了两种不同的架构,以评估哪些特征在由这两项任务共享时更相关,以及哪些特征应该保持分离以实现相互改进。同样,我们的方法在两种不同的场景下进行评估,旨在将我们的结果与单任务和多任务方法进行比较。定性和定量实验表明,我们方法的性能优于单任务方法的现有技术水平,同时与其他多任务方法相比获得了有竞争力的结果。