IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6955-6968. doi: 10.1109/TPAMI.2020.3034233. Epub 2023 May 5.
Group activity recognition (GAR) is a challenging task aimed at recognizing the behavior of a group of people. It is a complex inference process in which visual cues collected from individuals are integrated into the final prediction, being aware of the interaction between them. This paper goes one step further beyond the existing approaches by designing a Hierarchical Graph-based Cross Inference Network (HiGCIN), in which three levels of information, i.e., the body-region level, person level, and group-activity level, are constructed, learned, and inferred in an end-to-end manner. Primarily, we present a generic Cross Inference Block (CIB), which is able to concurrently capture the latent spatiotemporal dependencies among body regions and persons. Based on the CIB, two modules are designed to extract and refine features for group activities at each level. Experiments on two popular benchmarks verify the effectiveness of our approach, particularly in the ability to infer with multilevel visual cues. In addition, training our approach does not require individual action labels to be provided, which greatly reduces the amount of labor required in data annotation.
群体活动识别(GAR)是一项旨在识别一群人行为的具有挑战性的任务。这是一个复杂的推理过程,其中从个体收集的视觉线索被整合到最终的预测中,同时意识到它们之间的相互作用。本文通过设计一个分层图交叉推理网络(HiGCIN),在现有方法的基础上更进一步,在端到端的方式中构建、学习和推断三个层次的信息,即身体区域层、人员层和群体活动层。首先,我们提出了一个通用的交叉推理块(CIB),它能够同时捕捉身体区域和人员之间的潜在时空依赖性。基于 CIB,设计了两个模块,用于在每个级别提取和细化群体活动的特征。在两个流行的基准上的实验验证了我们方法的有效性,特别是在使用多层次视觉线索进行推断的能力。此外,训练我们的方法不需要提供个体动作标签,这大大减少了数据标注所需的工作量。