基于卷积神经网络-视觉Transformer的弱监督视频片段级异常检测

CNN-ViT Supported Weakly-Supervised Video Segment Level Anomaly Detection.

作者信息

Sharif Md Haidar, Jiao Lei, Omlin Christian W

机构信息

Department of ICT, University of Agder, 4630 Kristiansand, Norway.

出版信息

Sensors (Basel). 2023 Sep 7;23(18):7734. doi: 10.3390/s23187734.

DOI:10.3390/s23187734

PMID:37765792

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10537718/

Abstract

Video anomaly event detection (VAED) is one of the key technologies in computer vision for smart surveillance systems. With the advent of deep learning, contemporary advances in VAED have achieved substantial success. Recently, weakly supervised VAED (WVAED) has become a popular VAED technical route of research. WVAED methods do not depend on a supplementary self-supervised substitute task, yet they can assess anomaly scores straightway. However, the performance of WVAED methods depends on pretrained feature extractors. In this paper, we first address taking advantage of two pretrained feature extractors for CNN (e.g., C3D and I3D) and ViT (e.g., CLIP), for effectively extracting discerning representations. We then consider long-range and short-range temporal dependencies and put forward video snippets of interest by leveraging our proposed temporal self-attention network (TSAN). We design a multiple instance learning (MIL)-based generalized architecture named CNN-ViT-TSAN, by using CNN- and/or ViT-extracted features and TSAN to specify a series of models for the WVAED problem. Experimental results on publicly available popular crowd datasets demonstrated the effectiveness of our CNN-ViT-TSAN.

摘要

视频异常事件检测（VAED）是智能监控系统计算机视觉中的关键技术之一。随着深度学习的出现，VAED的当代进展取得了巨大成功。最近，弱监督VAED（WVAED）已成为一种流行的VAED技术研究路线。WVAED方法不依赖于补充的自监督替代任务，但它们可以直接评估异常分数。然而，WVAED方法的性能取决于预训练的特征提取器。在本文中，我们首先利用两个用于卷积神经网络（CNN）（例如C3D和I3D）和视觉Transformer（ViT）（例如CLIP）的预训练特征提取器，以有效提取有辨别力的表示。然后，我们考虑长程和短程时间依赖性，并通过利用我们提出的时间自注意力网络（TSAN）提出感兴趣的视频片段。我们设计了一种基于多实例学习（MIL）的广义架构，名为CNN-ViT-TSAN，通过使用CNN和/或ViT提取的特征以及TSAN为WVAED问题指定一系列模型。在公开可用的流行人群数据集上的实验结果证明了我们的CNN-ViT-TSAN的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于卷积神经网络-视觉Transformer的弱监督视频片段级异常检测

CNN-ViT Supported Weakly-Supervised Video Segment Level Anomaly Detection.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

基于卷积神经网络-视觉Transformer的弱监督视频片段级异常检测

CNN-ViT Supported Weakly-Supervised Video Segment Level Anomaly Detection.

作者信息

机构信息

出版信息

相似文献

本文引用的文献