Max Planck Institute for Informatics, Campus E1 4, 66123 Saarbrücken, Germany.
IEEE Trans Pattern Anal Mach Intell. 2013 Apr;35(4):882-97. doi: 10.1109/TPAMI.2012.174.
Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multiclass object detection, object tracking and scene labeling together with geometric 3D reasoning. Our model is able to represent complex object interactions such as inter-object occlusion, physical exclusion between objects, and geometric context. Inference in this model allows us to jointly recover the 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. Contrary to many other approaches, our system performs explicit occlusion reasoning and is therefore capable of tracking objects that are partially occluded for extended periods of time, or objects that have never been observed to their full extent. In addition, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for different types of challenging onboard sequences. We first show a substantial improvement to the state of the art in 3D multipeople tracking. Moreover, a similar performance gain is achieved for multiclass 3D tracking of cars and trucks on a challenging dataset.
近年来,随着检测、上下文建模和跟踪技术的不断进步,场景理解已成为计算机视觉研究重新关注的焦点。本文提出了一种新颖的概率 3D 场景模型,该模型将最新的多类别目标检测、目标跟踪和场景标记技术与几何 3D 推理相结合。我们的模型能够表示复杂的物体交互,如物体之间的遮挡、物体之间的物理排斥以及几何上下文。在这个模型中进行推理可以让我们仅使用单目视频作为输入,从移动观察者的角度共同恢复 3D 场景上下文,并对多个类别的物体进行 3D 多目标跟踪。与许多其他方法不同,我们的系统执行明确的遮挡推理,因此能够长时间跟踪部分遮挡的物体或从未完全观察到的物体。此外,我们还表明,对多个帧上收集的证据进行联合场景轨迹模型可以显著提高性能。该方法针对不同类型的挑战性车载序列进行了评估。我们首先在 3D 多人跟踪方面取得了显著优于现有技术的效果。此外,在具有挑战性的汽车数据集上,对汽车和卡车进行多类别 3D 跟踪也取得了类似的性能提升。