用于弱监督时间动作定位的集成原型网络

Ensemble Prototype Network For Weakly Supervised Temporal Action Localization.

作者信息

Wu Kewei, Luo Wenjie, Xie Zhao, Guo Dan, Zhang Zhao, Hong Richang

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4560-4574. doi: 10.1109/TNNLS.2024.3377468. Epub 2025 Feb 28.

DOI:10.1109/TNNLS.2024.3377468

Abstract

Weakly supervised temporal action localization (TAL) aims to localize the action instances in untrimmed videos using only video-level action labels. Without snippet-level labels, this task should be hard to distinguish all snippets with accurate action/background categories. The main difficulties are the large variations brought by the unconstraint background snippets and multiple subactions in action snippets. The existing prototype model focuses on describing snippets by covering them with clusters (defined as prototypes). In this work, we argue that the clustered prototype covering snippets with simple variations still suffers from the misclassification of the snippets with large variations. We propose an ensemble prototype network (EPNet), which ensembles prototypes learned with consensus-aware clustering. The network stacks a consensus prototype learning (CPL) module and an ensemble snippet weight learning (ESWL) module as one stage and extends one stage to multiple stages in an ensemble learning way. The CPL module learns the consensus matrix by estimating the similarity of clustering labels between two successive clustering generations. The consensus matrix optimizes the clustering to learn consensus prototypes, which can predict the snippets with consensus labels. The ESWL module estimates the weights of the misclassified snippets using the snippet-level loss. The weights update the posterior probabilities of the snippets in the clustering to learn prototypes in the next stage. We use multiple stages to learn multiple prototypes, which can cover the snippets with large variations for accurate snippet classification. Extensive experiments show that our method achieves the state-of-the-art weakly supervised TAL methods on two benchmark datasets, that is, THUMOS'14, ActivityNet v1.2, and ActivityNet v1.3 datasets.

摘要

弱监督时间动作定位（TAL）旨在仅使用视频级动作标签在未修剪视频中定位动作实例。由于没有片段级标签，这项任务很难将所有具有准确动作/背景类别的片段区分开来。主要困难在于无约束背景片段带来的巨大变化以及动作片段中的多个子动作。现有的原型模型专注于通过用聚类（定义为原型）覆盖片段来描述片段。在这项工作中，我们认为用简单变化覆盖片段的聚类原型仍然存在对具有大变化的片段进行错误分类的问题。我们提出了一种集成原型网络（EPNet），它集成了通过共识感知聚类学习到的原型。该网络将一个共识原型学习（CPL）模块和一个集成片段权重学习（ESWL）模块堆叠为一个阶段，并以集成学习的方式将一个阶段扩展到多个阶段。CPL模块通过估计两个连续聚类代之间聚类标签的相似度来学习共识矩阵。共识矩阵优化聚类以学习共识原型，该原型可以预测具有共识标签的片段。ESWL模块使用片段级损失估计错误分类片段的权重。权重更新聚类中片段的后验概率，以便在下一阶段学习原型。我们使用多个阶段来学习多个原型，这些原型可以覆盖具有大变化的片段以进行准确的片段分类。大量实验表明，我们的方法在两个基准数据集（即THUMOS'14、ActivityNet v1.2和ActivityNet v1.3数据集）上实现了当前最先进的弱监督TAL方法。

相似文献

Ensemble Prototype Network For Weakly Supervised Temporal Action Localization.用于弱监督时间动作定位的集成原型网络

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4560-4574. doi: 10.1109/TNNLS.2024.3377468. Epub 2025 Feb 28.

Adaptive Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization.自适应双流共识网络的弱监督时间动作定位。

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4136-4151. doi: 10.1109/TPAMI.2022.3189662. Epub 2023 Mar 7.

Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks.基于对比评估网络的弱监督时间动作定位

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5886-5902. doi: 10.1109/TPAMI.2021.3078798. Epub 2022 Aug 4.

Deep Motion Prior for Weakly-Supervised Temporal Action Localization.用于弱监督时间动作定位的深度运动先验

IEEE Trans Image Process. 2022;31:5203-5213. doi: 10.1109/TIP.2022.3193752. Epub 2022 Aug 4.

Uncertainty-Aware Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization.用于弱监督时间动作定位的不确定性感知双证据学习

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15896-15911. doi: 10.1109/TPAMI.2023.3308571. Epub 2023 Nov 3.

Weakly supervised temporal action localization with actionness-guided false positive suppression.基于动作引导型假阳性抑制的弱监督时间动作定位。

Neural Netw. 2024 Jul;175:106307. doi: 10.1016/j.neunet.2024.106307. Epub 2024 Apr 15.

Two-Branch Relational Prototypical Network for Weakly Supervised Temporal Action Localization.用于弱监督时间动作定位的双分支关系原型网络

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5729-5746. doi: 10.1109/TPAMI.2021.3076172. Epub 2022 Aug 4.

Adaptive Prototype Learning for Weakly-Supervised Temporal Action Localization.用于弱监督时间动作定位的自适应原型学习

IEEE Trans Image Process. 2025;34:3154-3168. doi: 10.1109/TIP.2024.3431915.

Multimodal and multiscale feature fusion for weakly supervised video anomaly detection.用于弱监督视频异常检测的多模态和多尺度特征融合

Sci Rep. 2024 Oct 1;14(1):22835. doi: 10.1038/s41598-024-73462-0.

ContextLoc++: A Unified Context Model for Temporal Action Localization.ContextLoc++：用于时间动作定位的统一上下文模型。

IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9504-9519. doi: 10.1109/TPAMI.2023.3237597. Epub 2023 Jun 30.

用于弱监督时间动作定位的集成原型网络

Ensemble Prototype Network For Weakly Supervised Temporal Action Localization.

作者信息

Wu Kewei, Luo Wenjie, Xie Zhao, Guo Dan, Zhang Zhao, Hong Richang

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4560-4574. doi: 10.1109/TNNLS.2024.3377468. Epub 2025 Feb 28.

DOI:10.1109/TNNLS.2024.3377468

PMID:38530719

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于弱监督时间动作定位的集成原型网络

Ensemble Prototype Network For Weakly Supervised Temporal Action Localization.

作者信息

出版信息

相似文献

用于弱监督时间动作定位的集成原型网络

Ensemble Prototype Network For Weakly Supervised Temporal Action Localization.

作者信息

出版信息

相似文献