单目视觉场景理解：理解多目标交通场景。

Monocular visual scene understanding: understanding multi-object traffic scenes.

机构信息

Max Planck Institute for Informatics, Campus E1 4, 66123 Saarbrücken, Germany.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2013 Apr;35(4):882-97. doi: 10.1109/TPAMI.2012.174.

DOI:10.1109/TPAMI.2012.174

Abstract

Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multiclass object detection, object tracking and scene labeling together with geometric 3D reasoning. Our model is able to represent complex object interactions such as inter-object occlusion, physical exclusion between objects, and geometric context. Inference in this model allows us to jointly recover the 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. Contrary to many other approaches, our system performs explicit occlusion reasoning and is therefore capable of tracking objects that are partially occluded for extended periods of time, or objects that have never been observed to their full extent. In addition, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for different types of challenging onboard sequences. We first show a substantial improvement to the state of the art in 3D multipeople tracking. Moreover, a similar performance gain is achieved for multiclass 3D tracking of cars and trucks on a challenging dataset.

摘要

近年来，随着检测、上下文建模和跟踪技术的不断进步，场景理解已成为计算机视觉研究重新关注的焦点。本文提出了一种新颖的概率 3D 场景模型，该模型将最新的多类别目标检测、目标跟踪和场景标记技术与几何 3D 推理相结合。我们的模型能够表示复杂的物体交互，如物体之间的遮挡、物体之间的物理排斥以及几何上下文。在这个模型中进行推理可以让我们仅使用单目视频作为输入，从移动观察者的角度共同恢复 3D 场景上下文，并对多个类别的物体进行 3D 多目标跟踪。与许多其他方法不同，我们的系统执行明确的遮挡推理，因此能够长时间跟踪部分遮挡的物体或从未完全观察到的物体。此外，我们还表明，对多个帧上收集的证据进行联合场景轨迹模型可以显著提高性能。该方法针对不同类型的挑战性车载序列进行了评估。我们首先在 3D 多人跟踪方面取得了显著优于现有技术的效果。此外，在具有挑战性的汽车数据集上，对汽车和卡车进行多类别 3D 跟踪也取得了类似的性能提升。

相似文献

Monocular visual scene understanding: understanding multi-object traffic scenes.

IEEE Trans Pattern Anal Mach Intell. 2013 Apr;35(4):882-97. doi: 10.1109/TPAMI.2012.174.

Explicit modeling of human-object interactions in realistic videos.

IEEE Trans Pattern Anal Mach Intell. 2013 Apr;35(4):835-48. doi: 10.1109/TPAMI.2012.175.

Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses.

IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1691-703. doi: 10.1109/TPAMI.2012.67.

Detailed 3D representations for object recognition and modeling.

IEEE Trans Pattern Anal Mach Intell. 2013 Nov;35(11):2608-23. doi: 10.1109/TPAMI.2013.87.

Robust multiperson tracking from a mobile platform.

IEEE Trans Pattern Anal Mach Intell. 2009 Oct;31(10):1831-46. doi: 10.1109/TPAMI.2009.109.

Coupled object detection and tracking from static cameras and moving vehicles.

IEEE Trans Pattern Anal Mach Intell. 2008 Oct;30(10):1683-98. doi: 10.1109/TPAMI.2008.170.

Observing human-object interactions: using spatial and functional compatibility for recognition.

IEEE Trans Pattern Anal Mach Intell. 2009 Oct;31(10):1775-89. doi: 10.1109/TPAMI.2009.83.

C4: a real-time object detection framework.

IEEE Trans Image Process. 2013 Oct;22(10):4096-107. doi: 10.1109/TIP.2013.2270111. Epub 2013 Jun 19.

Multiple target tracking by learning-based hierarchical association of detection responses.

IEEE Trans Pattern Anal Mach Intell. 2013 Apr;35(4):898-910. doi: 10.1109/TPAMI.2012.159.

Tracking pedestrians using local spatio-temporal motion patterns in extremely crowded scenes.

IEEE Trans Pattern Anal Mach Intell. 2012 May;34(5):987-1002. doi: 10.1109/TPAMI.2011.173.

引用本文的文献

Visual Object Recognition with 3D-Aware Features in KITTI Urban Scenes.

Sensors (Basel). 2015 Apr 20;15(4):9228-50. doi: 10.3390/s150409228.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

单目视觉场景理解：理解多目标交通场景。

Monocular visual scene understanding: understanding multi-object traffic scenes.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献