IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7001-7018. doi: 10.1109/TPAMI.2020.3032542. Epub 2023 May 5.
Learning to re-identify or retrieve a group of people across non-overlapped camera systems has important applications in video surveillance. However, most existing methods focus on (single) person re-identification (re-id), ignoring the fact that people often walk in groups in real scenarios. In this work, we take a step further and consider employing context information for identifying groups of people, i.e., group re-id. On the one hand, group re-id is more challenging than single person re-id, since it requires both a robust modeling of local individual person appearance (with different illumination conditions, pose/viewpoint variations, and occlusions), as well as full awareness of global group structures (with group layout and group member variations). On the other hand, we believe that person re-id can be greatly enhanced by incorporating additional visual context from neighboring group members, a task which we formulate as group-aware (single) person re-id. In this paper, we propose a novel unified framework based on graph neural networks to simultaneously address the above two group-based re-id tasks, i.e., group re-id and group-aware person re-id. Specifically, we construct a context graph with group members as its nodes to exploit dependencies among different people. A multi-level attention mechanism is developed to formulate both intra-group and inter-group context, with an additional self-attention module for robust graph-level representations by attentively aggregating node-level features. The proposed model can be directly generalized to tackle group-aware person re-id using node-level representations. Meanwhile, to facilitate the deployment of deep learning models on these tasks, we build a new group re-id dataset which contains more than 3.8K images with 1.5K annotated groups, an order of magnitude larger than existing group re-id datasets. Extensive experiments on the novel dataset as well as three existing datasets clearly demonstrate the effectiveness of the proposed framework for both group-based re-id tasks.
学习在非重叠相机系统之间重新识别或检索一群人在视频监控中有重要的应用。然而,大多数现有的方法都集中在(单人)人员重新识别(re-id)上,忽略了人们在实际场景中经常成群结队的事实。在这项工作中,我们更进一步,考虑利用上下文信息来识别人群,即群体 re-id。一方面,群体 re-id 比单人 re-id 更具挑战性,因为它需要对个体人员的局部外观进行稳健建模(具有不同的光照条件、姿势/视角变化和遮挡),同时还要全面了解全局群体结构(具有群体布局和群体成员变化)。另一方面,我们认为通过将来自相邻群体成员的额外视觉上下文纳入人员重新识别中,可以大大提高人员重新识别的效果,我们将此任务表述为群体感知的(单人)人员重新识别。在本文中,我们提出了一个基于图神经网络的新颖统一框架,以同时解决上述两个基于群体的重新识别任务,即群体重新识别和群体感知的人员重新识别。具体来说,我们构建了一个包含群体成员作为节点的上下文图,以利用不同人群之间的依赖关系。我们开发了一种多层次的注意力机制,以形成群体内和群体间的上下文,同时通过注意力聚合节点级特征来形成鲁棒的图级表示,还添加了一个自注意力模块。所提出的模型可以直接推广到使用节点级表示来解决群体感知的人员重新识别问题。同时,为了便于在这些任务上部署深度学习模型,我们构建了一个新的群体重新识别数据集,其中包含超过 3800 张图像和 1500 个标注的群体,比现有的群体重新识别数据集大一个数量级。在新数据集和三个现有数据集上的广泛实验清楚地表明了所提出的框架对于两个基于群体的重新识别任务的有效性。