IEEE Trans Pattern Anal Mach Intell. 2016 Oct;38(10):2082-95. doi: 10.1109/TPAMI.2015.2505292. Epub 2015 Dec 3.
Many computer vision tasks are more difficult when tackled without contextual information. For example, in multi-camera tracking, pedestrians may look very different in different cameras with varying pose and lighting conditions. Similarly, head direction estimation in high-angle surveillance video in which human head images are low resolution is challenging. Even humans can have trouble without contextual information. In this work, we couple novel contextual information, social grouping, with two important computer vision tasks: multi-target tracking and head pose/direction estimation in surveillance video. These three components are modeled in a probabilistic formulation and we provide effective solvers.We show that social grouping effectively helps to mitigate visual ambiguities in multi-camera tracking and head pose estimation. We further notice that in single-camera multi-target tracking, social grouping provides a natural high-order association cue that avoids existing complex algorithms for high-order track association. In experiments, we demonstrate improvements with our model over models without social grouping context and several state-of-art approaches on a number of publicly available datasets on tracking, head pose estimation, and group discovery.
许多计算机视觉任务在没有上下文信息的情况下处理起来更加困难。例如,在多摄像机跟踪中,行人在不同的摄像机中可能由于姿势和光照条件的不同而看起来非常不同。同样,在高角度监控视频中,由于人类头部图像分辨率较低,头部方向估计也具有挑战性。即使是人类在没有上下文信息的情况下也可能会遇到困难。在这项工作中,我们将新颖的上下文信息,即社会分组,与两个重要的计算机视觉任务结合在一起:多目标跟踪和监控视频中的头部姿势/方向估计。这三个组件以概率形式建模,我们提供了有效的求解方法。我们表明,社会分组有效地帮助减轻了多摄像机跟踪和头部姿势估计中的视觉歧义。我们进一步注意到,在单摄像机多目标跟踪中,社会分组提供了一种自然的高阶关联线索,避免了现有的复杂高阶跟踪关联算法。在实验中,我们在跟踪、头部姿势估计和群组发现等多个公开数据集上,展示了我们的模型相对于没有社会分组上下文的模型以及几种最先进方法的改进。