Suppr超能文献

用于改进视频暴力检测的变压器与自适应阈值滑动窗口

Transformer and Adaptive Threshold Sliding Window for Improving Violence Detection in Videos.

作者信息

Rendón-Segador Fernando J, Álvarez-García Juan A, Soria-Morillo Luis M

机构信息

Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, 41012 Sevilla, Spain.

出版信息

Sensors (Basel). 2024 Aug 22;24(16):5429. doi: 10.3390/s24165429.

Abstract

This paper presents a comprehensive approach to detect violent events in videos by combining CrimeNet, a Vision Transformer (ViT) model with structured neural learning and adversarial regularization, with an adaptive threshold sliding window model based on the Transformer architecture. CrimeNet demonstrates exceptional performance on all datasets (XD-Violence, UCF-Crime, NTU-CCTV Fights, UBI-Fights, Real Life Violence Situations, MediEval, RWF-2000, Hockey Fights, Violent Flows, Surveillance Camera Fights, and Movies Fight), achieving high AUC ROC and AUC PR values (up to 99% and 100%, respectively). However, the generalization of CrimeNet to cross-dataset experiments posed some problems, resulting in a 20-30% decrease in performance, for instance, training in UCF-Crime and testing in XD-Violence resulted in 70.20% in AUC ROC. The sliding window model with adaptive thresholding effectively solves these problems by automatically adjusting the violence detection threshold, resulting in a substantial improvement in detection accuracy. By applying the sliding window model as post-processing to CrimeNet results, we were able to improve detection accuracy by 10% to 15% in cross-dataset experiments. Future lines of research include improving generalization, addressing data imbalance, exploring multimodal representations, testing in real-world applications, and extending the approach to complex human interactions.

摘要

本文提出了一种综合方法,通过将一种结合了结构化神经学习和对抗正则化的视觉Transformer(ViT)模型CrimeNet与基于Transformer架构的自适应阈值滑动窗口模型相结合,来检测视频中的暴力事件。CrimeNet在所有数据集(XD-Violence、UCF-Crime、NTU-CCTV Fights、UBI-Fights、真实生活暴力场景、MediEval、RWF-2000、曲棍球比赛、暴力流、监控摄像头打架和电影打架)上都表现出卓越的性能,实现了高AUC ROC和AUC PR值(分别高达99%和100%)。然而,CrimeNet在跨数据集实验中的泛化存在一些问题,导致性能下降20%-30%,例如,在UCF-Crime上训练并在XD-Violence上测试时,AUC ROC为70.20%。具有自适应阈值的滑动窗口模型通过自动调整暴力检测阈值有效地解决了这些问题,从而使检测准确率有了显著提高。通过将滑动窗口模型作为后处理应用于CrimeNet的结果,我们在跨数据集实验中能够将检测准确率提高10%到15%。未来的研究方向包括提高泛化能力、解决数据不平衡问题、探索多模态表示、在实际应用中进行测试,以及将该方法扩展到复杂的人际交互。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ccc/11359545/8a3b99655994/sensors-24-05429-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验