用于密集城市环境中实时异常行为检测的多摄像头时空深度学习框架。

Multi-camera spatiotemporal deep learning framework for real-time abnormal behavior detection in dense urban environments.

作者信息

Veesam Sai Babu, Rao B Tarakeswara, Begum Zarina, Patibandla R S M Lakshmi, Dcosta Arvin Arun, Bansal Shonak, Prakash Krishna, Faruque Mohammad Rashed Iqbal, Al-Mugren K S

机构信息

School of Computer Science and Engineering, VIT-AP University, Amaravathi, 522241, India.

Department of C.S. E, KKR & KSR Institute of Technology and Sciences, Guntur, Andhra Pradesh, India.

出版信息

Sci Rep. 2025 Jul 23;15(1):26813. doi: 10.1038/s41598-025-12388-7.

DOI:10.1038/s41598-025-12388-7

PMID:40702170

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12287515/

Abstract

The emerging density in today's urban environments requires a strong multi-camera architecture for real-time abnormality detection and behavior analysis. Most of the existing methods tend to fail in detecting unusual behaviors due to occlusion, dynamic scene changes and high computational inefficiency. These failures often result in high rates of false positives and poor generalization for unseen anomalies. Both traditional graphs based and even the current CNN-RNN systems fail to capture complex social interactions and spatiotemporal dependencies; therefore, much is limited in such scenarios where people crowd. To address those drawbacks, this research proposes a deep learning framework for abnormal behavior detection with multiple cameras using spatiotemporal information, integrating several new methodologies. Multi Scale Graph Attention Networks (MS-GAT) are used to achieve interaction-aware anomaly detection, which has resulted in up to 30% reduction in false positives. RL-DCAT or the Reinforcement Learning Based Dynamic Camera Attention Transformer works very efficiently for optimizing surveillance focus, which helps reduce 40% of the computational overhead and increases recall by 15%. Given this, STICL-Spatiotemporal Inverse Contrastive Learning, which uses an inverse contrastive anomaly memory, increases the generalization to unseen rare anomalies by 25% improved recall. Neuromorphic event-based encoding captures speed action analysis through spiking neural networks, lowering detection latency by 60%. Finally, the BGS-MFA synthesizes new abnormal behaviors using generative behavior synthesis and meta-learned few-shot adaptation to generalize anomaly detection by 35%. Evaluation on the UCF-Crime, ShanghaiTech and Avenue Datasets showed 40% better false alarm reduction, 50% computational demands lower and an impressive 98% real-time efficiency of this multi-faceted framework. This total framework will enable multi-camera crowd surveillance with adaptive scalability and resource provisioning for real-time dynamic behavioral anomaly detection in real-world settings.

摘要

当今城市环境中不断涌现的高密度人群需要强大的多摄像头架构来进行实时异常检测和行为分析。由于遮挡、动态场景变化和高计算效率低下，大多数现有方法在检测异常行为时往往会失败。这些失败通常会导致误报率高，并且对未见过的异常情况泛化能力差。传统的基于图的方法以及当前的CNN-RNN系统都无法捕捉复杂的社会互动和时空依赖性；因此，在人群拥挤的场景中，其能力受到很大限制。为了解决这些缺点，本研究提出了一种深度学习框架，用于利用时空信息通过多摄像头进行异常行为检测，并集成了几种新方法。多尺度图注意力网络（MS-GAT）用于实现交互感知异常检测，可将误报率降低多达30%。基于强化学习的动态相机注意力变换器RL-DCAT在优化监控焦点方面非常有效，有助于减少40%的计算开销，并将召回率提高15%。鉴于此，使用逆对比异常记忆的时空逆对比学习（STICL）将对未见过的罕见异常的泛化能力提高了25%，召回率也有所提高。基于神经形态事件的编码通过脉冲神经网络捕捉速度动作分析，将检测延迟降低60%。最后，背景减除多特征分析（BGS-MFA）使用生成行为合成和元学习的少样本适应来合成新的异常行为，将异常检测的泛化能力提高35%。在UCF-Crime、上海科技大学和Avenue数据集上的评估表明，这个多方面的框架在减少误报方面提高了40%，计算需求降低了50%，实时效率达到了令人印象深刻的98%。这个整体框架将实现具有自适应可扩展性和资源配置的多摄像头人群监控，用于在现实世界场景中进行实时动态行为异常检测。