Zhang Zhenyu, Cui Zhen, Xu Chunyan, Jie Zequn, Li Xiang, Yang Jian
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2608-2623. doi: 10.1109/TPAMI.2019.2926728. Epub 2019 Jul 10.
RGB-D scene understanding under monocular camera is an emerging and challenging topic with many potential applications. In this paper, we propose a novel Task-Recursive Learning (TRL) framework to jointly and recurrently conduct three representative tasks therein containing depth estimation, surface normal prediction and semantic segmentation. TRL recursively refines the prediction results through a series of task-level interactions, where one-time cross-task interaction is abstracted as one network block of one time stage. In each stage, we serialize multiple tasks into a sequence and then recursively perform their interactions. To adaptively enhance counterpart patterns, we encapsulate interactions into a specific Task-Attentional Module (TAM) to mutually-boost the tasks from each other. Across stages, the historical experiences of previous states of tasks are selectively propagated into the next stages by using Feature-Selection unit (FS-Unit), which takes advantage of complementary information across tasks. The sequence of task-level interactions is also evolved along a coarse-to-fine scale space such that the required details may be refined progressively. Finally the task-abstracted sequence problem of multi-task prediction is framed into a recursive network. Extensive experiments on NYU-Depth v2 and SUN RGB-D datasets demonstrate that our method can recursively refines the results of the triple tasks and achieves state-of-the-art performance.
单目相机下的RGB-D场景理解是一个新兴且具有挑战性的课题,有着许多潜在应用。在本文中,我们提出了一种新颖的任务递归学习(TRL)框架,以联合并递归地执行其中三个具有代表性的任务,包括深度估计、表面法线预测和语义分割。TRL通过一系列任务级交互递归地优化预测结果,其中一次性跨任务交互被抽象为一个时间阶段的一个网络块。在每个阶段,我们将多个任务序列化成为一个序列,然后递归地执行它们的交互。为了自适应地增强对应模式,我们将交互封装到一个特定的任务注意力模块(TAM)中,以便相互促进各个任务。在不同阶段之间,任务先前状态的历史经验通过使用特征选择单元(FS-Unit)被选择性地传播到下一阶段,该单元利用了跨任务的互补信息。任务级交互的序列也沿着从粗到细的尺度空间进行演化,以便所需的细节可以逐步细化。最后,多任务预测的任务抽象序列问题被构建为一个递归网络。在NYU-Depth v2和SUN RGB-D数据集上进行的大量实验表明,我们的方法可以递归地优化这三个任务的结果,并实现了当前最优的性能。