Cao Congqi, Zhang Hanwen, Lu Yue, Wang Peng, Zhang Yanning
IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):224-239. doi: 10.1109/TPAMI.2024.3461718. Epub 2024 Dec 4.
Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.
视频异常检测(VAD)在智能监控中起着至关重要的作用。然而,一种名为场景依赖型异常的重要异常类型却被忽视了。此外,视频异常预测(VAA)任务也值得关注。为了填补这些空白,我们构建了一个名为NWPU校园的综合数据集,它是最大的半监督VAD数据集,也是第一个用于场景依赖型VAD和VAA的数据集。同时,我们为场景依赖型VAD和VAA引入了一种新颖的前后向框架,其中前向网络单独解决VAD问题,并与后向网络联合解决VAA问题。特别地,我们为前向和后向网络在潜在空间中提出了一种场景依赖型生成模型。首先,我们提出了一种分层变分自编码器来提取场景通用特征。接下来,我们在潜在空间中设计了一种基于分数的扩散模型,以使这些特征针对任务更加紧凑,并通过场景信息自编码器生成场景依赖型特征,对视频事件和场景之间的关系进行建模。最后,我们从关键帧开发了一种时间损失来约束视频片段的运动一致性。大量实验表明,我们的方法能够很好地处理场景依赖型异常检测和预测,在上海科技大学、香港中文大学大道以及所提出的NWPU校园数据集上取得了领先的性能。