Chai Liangyu, Liu Yongtuo, Liu Wenxi, Han Guoqiang, He Shengfeng
IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2856-2871. doi: 10.1109/TPAMI.2020.3043372. Epub 2022 May 5.
In this paper, we introduce a novel yet challenging research problem, interactive crowd video generation, committed to producing diverse and continuous crowd video, and relieving the difficulty of insufficient annotated real-world datasets in crowd analysis. Our goal is to recursively generate realistic future crowd video frames given few context frames, under the user-specified guidance, namely individual positions of the crowd. To this end, we propose a deep network architecture specifically designed for crowd video generation that is composed of two complementary modules, each of which combats the problems of crowd dynamic synthesis and appearance preservation respectively. Particularly, a spatio-temporal transfer module is proposed to infer the crowd position and structure from guidance and temporal information, and a point-aware flow prediction module is presented to preserve appearance consistency by flow-based warping. Then, the outputs of the two modules are integrated by a self-selective fusion unit to produce an identity-preserved and continuous video. Unlike previous works, we generate continuous crowd behaviors beyond identity annotations or matching. Extensive experiments show that our method is effective for crowd video generation. More importantly, we demonstrate the generated video can produce diverse crowd behaviors and be used for augmenting different crowd analysis tasks, i.e., crowd counting, anomaly detection, crowd video prediction. Code is available at https://github.com/Icep2020/CrowdGAN.
在本文中,我们引入了一个新颖但具有挑战性的研究问题——交互式人群视频生成,致力于生成多样化且连续的人群视频,并缓解人群分析中真实世界标注数据集不足的难题。我们的目标是在用户指定的指导下,即人群的个体位置,根据少量上下文帧递归地生成逼真的未来人群视频帧。为此,我们提出了一种专门为人群视频生成设计的深度网络架构,它由两个互补模块组成,每个模块分别解决人群动态合成和外观保留的问题。具体而言,提出了一个时空转移模块,用于从指导信息和时间信息中推断人群位置和结构,并提出了一个点感知流预测模块,通过基于流的扭曲来保持外观一致性。然后,两个模块的输出由一个自选择融合单元整合,以生成一个保持身份并连续的视频。与以往的工作不同,我们生成的连续人群行为超越了身份标注或匹配。大量实验表明,我们的方法在人群视频生成方面是有效的。更重要的是,我们证明了生成的视频可以产生多样化的人群行为,并可用于增强不同的人群分析任务,即人群计数、异常检测、人群视频预测。代码可在https://github.com/Icep2020/CrowdGAN获取。