Department of Automation, Faculty of Electrical and Electronic Engineering, Kaunas University of Technology, 51367 Kaunas, Lithuania.
Sensors (Basel). 2022 Mar 13;22(6):2216. doi: 10.3390/s22062216.
Intelligent video surveillance systems are rapidly being introduced to public places. The adoption of computer vision and machine learning techniques enables various applications for collected video features; one of the major is safety monitoring. The efficacy of violent event detection is measured by the efficiency and accuracy of violent event detection. In this paper, we present a novel architecture for violence detection from video surveillance cameras. Our proposed model is a spatial feature extracting a U-Net-like network that uses MobileNet V2 as an encoder followed by LSTM for temporal feature extraction and classification. The proposed model is computationally light and still achieves good results-experiments showed that an average accuracy is 0.82 ± 2% and average precision is 0.81 ± 3% using a complex real-world security camera footage dataset based on RWF-2000.
智能视频监控系统正在迅速被引入公共场所。计算机视觉和机器学习技术的采用使得收集的视频特征可以应用于各种领域;其中主要的应用之一是安全监控。暴力事件检测的有效性是通过暴力事件检测的效率和准确性来衡量的。在本文中,我们提出了一种从视频监控摄像机中检测暴力的新架构。我们提出的模型是一种空间特征提取的 U-Net 样网络,它使用 MobileNet V2 作为编码器,然后使用 LSTM 进行时间特征提取和分类。该模型计算量小,仍然能取得很好的效果——实验表明,使用基于 RWF-2000 的复杂真实安全摄像头视频数据集,平均准确率为 0.82±2%,平均精度为 0.81±3%。