School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.
IEEE Trans Pattern Anal Mach Intell. 2012 Aug;34(8):1549-62. doi: 10.1109/TPAMI.2011.228.
In this paper, we go beyond recognizing the actions of individuals and focus on group activities. This is motivated from the observation that human actions are rarely performed in isolation; the contextual information of what other people in the scene are doing provides a useful cue for understanding high-level activities. We propose a novel framework for recognizing group activities which jointly captures the group activity, the individual person actions, and the interactions among them. Two types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. In particular, we propose three different approaches to model the person-person interaction. One approach is to explore the structures of person-person interaction. Differently from most of the previous latent structured models, which assume a predefined structure for the hidden layer, e.g., a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. The second approach explores person-person interaction in the feature level. We introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behavior of other people nearby. The third approach combines the above two. Our experimental results demonstrate the benefit of using contextual information for disambiguating group activities.
在本文中,我们超越了识别个体行为的范畴,专注于群体活动。这是基于这样一种观察:人类行为很少是孤立进行的;场景中其他人在做什么的上下文信息为理解高级活动提供了有用的线索。我们提出了一个新的框架来识别群体活动,该框架共同捕捉了群体活动、个体行为以及它们之间的相互作用。在潜在变量框架中探索了两种类型的上下文信息,即群体-人交互和人-人交互。具体来说,我们提出了三种不同的方法来建模人-人交互。一种方法是探索人-人交互的结构。与大多数之前的潜在结构模型不同,这些模型假设隐藏层的预定义结构,例如树结构,我们将隐藏层的结构视为一个潜在变量,并在学习和推理过程中隐式推断它。第二种方法是在特征层面探索人-人交互。我们引入了一种新的特征表示,称为动作上下文(AC)描述符。AC 描述符不仅编码了视频中个体行为的信息,还编码了附近其他人的行为信息。第三种方法结合了上述两种方法。我们的实验结果证明了使用上下文信息来消除群体活动歧义的好处。