基于注意力残差 LSTM 的监控视频高效异常识别框架

An Efficient Anomaly Recognition Framework Using an Attention Residual LSTM in Surveillance Videos.

机构信息

Sejong University, Seoul 143-747, Korea.

出版信息

Sensors (Basel). 2021 Apr 16;21(8):2811. doi: 10.3390/s21082811.

DOI:10.3390/s21082811

PMID:33923712

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8072779/

Abstract

Video anomaly recognition in smart cities is an important computer vision task that plays a vital role in smart surveillance and public safety but is challenging due to its diverse, complex, and infrequent occurrence in real-time surveillance environments. Various deep learning models use significant amounts of training data without generalization abilities and with huge time complexity. To overcome these problems, in the current work, we present an efficient light-weight convolutional neural network (CNN)-based anomaly recognition framework that is functional in a surveillance environment with reduced time complexity. We extract spatial CNN features from a series of video frames and feed them to the proposed residual attention-based long short-term memory (LSTM) network, which can precisely recognize anomalous activity in surveillance videos. The representative CNN features with the residual blocks concept in LSTM for sequence learning prove to be effective for anomaly detection and recognition, validating our model's effective usage in smart cities video surveillance. Extensive experiments on the real-world benchmark UCF-Crime dataset validate the effectiveness of the proposed model within complex surveillance environments and demonstrate that our proposed model outperforms state-of-the-art models with a 1.77%, 0.76%, and 8.62% increase in accuracy on the UCF-Crime, UMN and Avenue datasets, respectively.

摘要

智慧城市中的视频异常识别是一项重要的计算机视觉任务，在智能监控和公共安全中起着至关重要的作用，但由于其在实时监控环境中具有多样性、复杂性和罕见性，因此具有挑战性。各种深度学习模型使用大量的训练数据，但缺乏泛化能力，且时间复杂度很大。为了克服这些问题，在目前的工作中，我们提出了一种高效的轻量级基于卷积神经网络（CNN）的异常识别框架，该框架在具有降低的时间复杂度的监控环境中是有效的。我们从一系列视频帧中提取空间 CNN 特征，并将其输入到所提出的基于残差注意力的长短期记忆（LSTM）网络中，该网络可以精确识别监控视频中的异常活动。具有 LSTM 中残差块概念的代表性 CNN 特征对于序列学习非常有效，验证了我们的模型在智慧城市视频监控中的有效使用。在真实世界的基准 UCF-Crime 数据集上进行的广泛实验验证了该模型在复杂监控环境中的有效性，并表明我们的模型在 UCF-Crime、UMN 和 Avenue 数据集上的准确性分别提高了 1.77%、0.76%和 8.62%，优于最先进的模型。