Suppr超能文献

每一个像素都很重要++:通过3D整体理解进行几何与运动的联合学习。

Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding.

作者信息

Luo Chenxu, Yang Zhenheng, Wang Peng, Wang Yang, Xu Wei, Nevatia Ramkant, Yuille Alan

出版信息

IEEE Trans Pattern Anal Mach Intell. 2019 Jul 23. doi: 10.1109/TPAMI.2019.2930258.

Abstract

Learning to estimate 3D geometry in a single frame and optical flow from consecutive frames by watching unlabeled videos via deep convolutional network has made significant progress recently. Current state-of-the-art (SoTA) methods treat the two tasks independently. One important assumption of the existing depth estimation methods is that the scenes contain no moving object. In this paper, we propose to address the two tasks as a whole, i.e. to jointly understand per-pixel 3D geometry and motion. This eliminates the need of static scene assumption and enforces the inherent geometrical consistency during the learning process, yielding significantly improved results for both tasks. We call our method as "Every Pixel Counts++" or "EPC++". Various loss terms are formulated to jointly supervise the learning across geometrical cues and effective adaptive training strategy is proposed to achieve better performance. Comprehensive experiments were conducted on datasets with different scenes, including driving scenario (KITTI 2012 and KITTI 2015 datasets), mixed outdoor/indoor scenes (Make3D) and synthetic animation (MPI Sintel dataset). Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that our approach outperforms other SoTA methods, demonstrating the effectiveness of each module of our proposed method.

摘要

通过深度卷积网络观看未标记视频来学习在单帧中估计3D几何形状以及从连续帧中估计光流,近年来取得了重大进展。当前的最先进(SoTA)方法将这两个任务独立处理。现有深度估计方法的一个重要假设是场景中不包含移动物体。在本文中,我们建议将这两个任务作为一个整体来解决,即联合理解每个像素的3D几何形状和运动。这消除了静态场景假设的需要,并在学习过程中强制保持固有的几何一致性,从而在两个任务上都产生了显著改进的结果。我们将我们的方法称为“每个像素都重要++”或“EPC++”。制定了各种损失项来联合监督跨几何线索的学习,并提出了有效的自适应训练策略以实现更好的性能。在具有不同场景的数据集上进行了全面实验,包括驾驶场景(KITTI 2012和KITTI 2015数据集)、室外/室内混合场景(Make3D)和合成动画(MPI Sintel数据集)。在深度估计、光流估计、里程计、移动物体分割和场景流估计这五个任务上的性能表明,我们的方法优于其他SoTA方法,证明了我们提出的方法每个模块的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验