Suppr超能文献

用于弱监督视频异常检测的多模态和多尺度特征融合

Multimodal and multiscale feature fusion for weakly supervised video anomaly detection.

作者信息

Sun Wenwen, Cao Lin, Guo Yanan, Du Kangning

机构信息

Key Laboratory of the Ministry of Education for Optoelectronic Measurement Technology and Instrument, Beijing Information Science and Technology University, Beijing, 100192, China.

School of Instrument Science and Opto-Electronics Engineering, Beijing Information Science and Technology University, Beijing, 100192, China.

出版信息

Sci Rep. 2024 Oct 1;14(1):22835. doi: 10.1038/s41598-024-73462-0.

Abstract

Weakly supervised video anomaly detection aims to detect anomalous events with only video-level labels. In the absence of boundary information for anomaly segments, most existing methods rely on multiple instance learning. In these approaches, the predictions for unlabeled video snippets are guided by the classification of labeled untrimmed videos. However, these methods do not account for issues such as video blur and visual occlusion, which can hinder accurate anomaly detection. To address these issues, we propose a novel weakly supervised video anomaly detection method that fuses multimodal and multiscale features. Firstly, RGB and optical flow snippets are input into pre-trained I3D to extract appearance and motion features. Then, we introduce an Attention De-redundancy (AD) module, which employs an attention mechanism to filter out task-irrelevant redundancy in these appearance and motion features. Next, to mitigate the effects of video blurring and visual occlusion, we propose a Multi-scale Feature Learning module. This module captures long-term and short-term temporal dependencies among video snippets to provide global and local guidance for blurred or occluded video snippets. Finally, to effectively utilize the discriminative features of different modalities, we propose an Adaptive Feature Fusion module. This module adaptively fuses appearance and motion features based on their respective feature weights. Extensive experimental results demonstrate that our proposed method outperforms mainstream unsupervised and weakly supervised methods in terms of AUC. Specifically, our proposed method achieves 97.00% AUC and 85.31% AUC on two benchmark datasets, i.e., ShanghaiTech and UCF-Crime, respectively.

摘要

弱监督视频异常检测旨在仅利用视频级标签来检测异常事件。在缺乏异常片段边界信息的情况下,大多数现有方法依赖于多实例学习。在这些方法中,对未标记视频片段的预测由标记的未修剪视频的分类来引导。然而,这些方法没有考虑诸如视频模糊和视觉遮挡等问题,这些问题可能会阻碍准确的异常检测。为了解决这些问题,我们提出了一种融合多模态和多尺度特征的新型弱监督视频异常检测方法。首先,将RGB和光流片段输入到预训练的I3D中以提取外观和运动特征。然后,我们引入了一个注意力去冗余(AD)模块,该模块采用注意力机制来滤除这些外观和运动特征中与任务无关的冗余。接下来,为了减轻视频模糊和视觉遮挡的影响,我们提出了一个多尺度特征学习模块。该模块捕捉视频片段之间的长期和短期时间依赖性,为模糊或遮挡的视频片段提供全局和局部指导。最后,为了有效利用不同模态的判别特征,我们提出了一个自适应特征融合模块。该模块根据各自的特征权重自适应地融合外观和运动特征。大量实验结果表明,我们提出的方法在AUC方面优于主流的无监督和弱监督方法。具体而言,我们提出的方法在两个基准数据集,即上海科技大学数据集和UCF犯罪数据集上分别实现了97.00%的AUC和85.31%的AUC。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/485a/11445271/18238614b1ec/41598_2024_73462_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验