Liu Tianshan, Lam Kin-Man, Bao Bing-Kun
IEEE Trans Image Process. 2024;33:5907-5920. doi: 10.1109/TIP.2024.3477351. Epub 2024 Oct 18.
Video anomaly detection (VAD) aims at localizing the snippets containing anomalous events in long unconstrained videos. The weakly supervised (WS) setting, where solely video-level labels are available during training, has attracted considerable attention, owing to its satisfactory trade-off between the detection performance and annotation cost. However, due to lack of snippet-level dense labels, the existing WS-VAD methods still get easily stuck on the detection errors, caused by false alarms and incomplete localization. To address this dilemma, in this paper, we propose to inject text clues of anomaly-event categories for improving WS-VAD, via a dedicated dual-branch framework. For suppressing the response of confusing normal contexts, we first present a text-guided anomaly discovering (TAG) branch based on a hierarchical matching scheme, which utilizes the label-text queries to search the discriminative anomalous snippets in a global-to-local fashion. To facilitate the completeness of anomaly-instance localization, an anomaly-conditioned text completion (ATC) branch is further designed to perform an auxiliary generative task, which intrinsically forces the model to gather sufficient event semantics from all the relevant anomalous snippets for completely reconstructing the masked description sentence. Furthermore, to encourage the cross-branch knowledge sharing, a mutual learning strategy is introduced by imposing a consistency constraint on the anomaly scores of these two branches. Extensive experimental results on two public benchmarks validate that the proposed method achieves superior performance over the competing methods.
视频异常检测(VAD)旨在在长的无约束视频中定位包含异常事件的片段。弱监督(WS)设置,即在训练期间仅提供视频级标签,由于其在检测性能和标注成本之间取得了令人满意的平衡,因此受到了广泛关注。然而,由于缺乏片段级的密集标签,现有的WS-VAD方法仍然容易陷入由误报和定位不完整导致的检测错误中。为了解决这一困境,在本文中,我们提出通过一个专用的双分支框架注入异常事件类别的文本线索来改进WS-VAD。为了抑制混淆正常上下文的响应,我们首先基于分层匹配方案提出一个文本引导的异常发现(TAG)分支,该分支利用标签-文本查询以全局到局部的方式搜索有区分力的异常片段。为了促进异常实例定位的完整性,进一步设计了一个异常条件文本完成(ATC)分支来执行辅助生成任务,这本质上迫使模型从所有相关异常片段中收集足够的事件语义以完全重建被屏蔽的描述句子。此外,为了鼓励跨分支知识共享,通过对这两个分支的异常分数施加一致性约束引入了一种相互学习策略。在两个公共基准上的大量实验结果验证了所提出的方法比竞争方法具有更优的性能。