• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从弱标注视频中提炼特权知识用于异常事件检测

Distilling Privileged Knowledge for Anomalous Event Detection From Weakly Labeled Videos.

作者信息

Liu Tianshan, Lam Kin-Man, Kong Jun

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12627-12641. doi: 10.1109/TNNLS.2023.3263966. Epub 2024 Sep 3.

DOI:10.1109/TNNLS.2023.3263966
PMID:37037244
Abstract

Weakly supervised video anomaly detection (WS-VAD) aims to identify the snippets involving anomalous events in long untrimmed videos, with solely text video-level binary labels. A typical paradigm among the existing text WS-VAD methods is to employ multiple modalities as inputs, e.g., RGB, optical flow, and audio, as they can provide sufficient discriminative clues that are robust to the diverse, complicated real-world scenes. However, such a pipeline has high reliance on the availability of multiple modalities and is computationally expensive and storage demanding in processing long sequences, which limits its use in some applications. To address this dilemma, we propose a privileged knowledge distillation (KD) framework dedicated to the WS-VAD task, which can maintain the benefits of exploiting additional modalities, while avoiding the need for using multimodal data in the inference phase. We argue that the performance of the privileged KD framework mainly depends on two factors: 1) the effectiveness of the multimodal teacher network and 2) the completeness of the useful information transfer. To obtain a reliable teacher network, we propose a text cross-modal interactive learning strategy and an anomaly normal discrimination loss, which target learning task-specific cross-modal features and encourage the separability of anomalous and normal representations, respectively. Furthermore, we design both representation- and text logits-level distillation loss functions, which force the unimodal student network to distill abundant privileged knowledge from the text well-trained multimodal teacher network, in a snippet-to-video fashion. Extensive experimental results on three public benchmarks demonstrate that the proposed privileged KD framework can train a lightweight yet effective detector, for localizing anomaly events under the supervision of video-level annotations.

摘要

弱监督视频异常检测(WS-VAD)旨在识别长未修剪视频中涉及异常事件的片段,仅使用文本视频级二进制标签。现有文本WS-VAD方法中的一种典型范式是采用多种模态作为输入,例如RGB、光流和音频,因为它们可以提供足够的判别线索,对多样、复杂的现实世界场景具有鲁棒性。然而,这样的管道高度依赖多种模态的可用性,并且在处理长序列时计算成本高且存储需求大,这限制了其在某些应用中的使用。为了解决这一困境,我们提出了一种专门用于WS-VAD任务的特权知识蒸馏(KD)框架,该框架可以保持利用额外模态的好处,同时避免在推理阶段使用多模态数据的需求。我们认为特权KD框架的性能主要取决于两个因素:1)多模态教师网络的有效性和2)有用信息传递的完整性。为了获得可靠的教师网络,我们提出了一种文本跨模态交互学习策略和一种异常正常判别损失,分别针对学习特定任务的跨模态特征和鼓励异常与正常表示的可分离性。此外,我们设计了表示级和文本逻辑级蒸馏损失函数,以片段到视频的方式迫使单模态学生网络从经过良好训练的多模态教师网络中蒸馏出丰富的特权知识。在三个公共基准上的大量实验结果表明,所提出的特权KD框架可以训练一个轻量级但有效的检测器,用于在视频级注释的监督下定位异常事件。

相似文献

1
Distilling Privileged Knowledge for Anomalous Event Detection From Weakly Labeled Videos.从弱标注视频中提炼特权知识用于异常事件检测
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12627-12641. doi: 10.1109/TNNLS.2023.3263966. Epub 2024 Sep 3.
2
Injecting Text Clues for Improving Anomalous Event Detection From Weakly Labeled Videos.注入文本线索以改进从弱标注视频中检测异常事件
IEEE Trans Image Process. 2024;33:5907-5920. doi: 10.1109/TIP.2024.3477351. Epub 2024 Oct 18.
3
Multimodal and multiscale feature fusion for weakly supervised video anomaly detection.用于弱监督视频异常检测的多模态和多尺度特征融合
Sci Rep. 2024 Oct 1;14(1):22835. doi: 10.1038/s41598-024-73462-0.
4
Toward Video Anomaly Retrieval From Video Anomaly Detection: New Benchmarks and Model.从视频异常检测到视频异常检索:新基准与模型
IEEE Trans Image Process. 2024;33:2213-2225. doi: 10.1109/TIP.2024.3374070. Epub 2024 Mar 25.
5
Weakly Supervised Video Anomaly Detection via Self-Guided Temporal Discriminative Transformer.基于自引导时间判别变压器的弱监督视频异常检测
IEEE Trans Cybern. 2024 May;54(5):3197-3210. doi: 10.1109/TCYB.2022.3227044. Epub 2024 Apr 16.
6
Learning With Privileged Multimodal Knowledge for Unimodal Segmentation.基于特权多模态知识的单模态分割学习。
IEEE Trans Med Imaging. 2022 Mar;41(3):621-632. doi: 10.1109/TMI.2021.3119385. Epub 2022 Mar 2.
7
Comprehensive learning and adaptive teaching: Distilling multi-modal knowledge for pathological glioma grading.综合学习与适应性教学:提炼用于脑胶质瘤病理分级的多模态知识
Med Image Anal. 2024 Jan;91:102990. doi: 10.1016/j.media.2023.102990. Epub 2023 Oct 9.
8
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection.学习用于弱监督视频异常检测的提示增强上下文特征。
IEEE Trans Image Process. 2024;33:4923-4936. doi: 10.1109/TIP.2024.3451935. Epub 2024 Sep 11.
9
Localizing Anomalies From Weakly-Labeled Videos.从弱标注视频中定位异常
IEEE Trans Image Process. 2021;30:4505-4515. doi: 10.1109/TIP.2021.3072863. Epub 2021 Apr 28.
10
Unsupervised Anomaly Detection with Distillated Teacher-Student Network Ensemble.基于蒸馏师生网络集成的无监督异常检测
Entropy (Basel). 2021 Feb 6;23(2):201. doi: 10.3390/e23020201.