Suppr超能文献

用于弱监督细粒度时间动作定位的上下文敏感网络。

Context Sensitive Network for weakly-supervised fine-grained temporal action localization.

作者信息

Dong Cerui, Liu Qinying, Wang Zilei, Zhang Yixin, Zhao Feng

机构信息

National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China, Hefei, 230026, China.

出版信息

Neural Netw. 2025 May;185:107140. doi: 10.1016/j.neunet.2025.107140. Epub 2025 Jan 24.

Abstract

Weakly-supervised fine-grained temporal action localization seeks to identify fine-grained action instances in untrimmed videos using only video-level labels. The primary challenge in this task arises from the subtle distinctions among various fine-grained action categories, which complicate the accurate localization of specific action instances. In this paper, we note that the context information embedded within the videos plays a crucial role in overcoming this challenge. However, we also find that effectively integrating context information across different scales is non-trivial, as not all scales provide equally valuable information for distinguishing fine-grained actions. Based on these observations, we propose a weakly-supervised fine-grained temporal action localization approach termed the Context Sensitive Network, which aims to fully leverage context information. Specifically, we first introduce a multi-scale context extraction module designed to efficiently capture multi-scale temporal contexts. Subsequently, we develop a scale-sensitive context gating module that facilitates interaction among multi-scale contexts and adaptively selects informative contexts based on varying video content. Extensive experiments conducted on two benchmark datasets, FineGym and FineAction, demonstrate that our approach achieves state-of-the-art performance.

摘要

弱监督细粒度时间动作定位旨在仅使用视频级标签在未修剪的视频中识别细粒度动作实例。此任务的主要挑战源于各种细粒度动作类别之间的细微差别,这使得特定动作实例的精确定位变得复杂。在本文中,我们注意到视频中嵌入的上下文信息在克服这一挑战中起着至关重要的作用。然而,我们也发现有效地跨不同尺度整合上下文信息并非易事,因为并非所有尺度都为区分细粒度动作提供同等有价值的信息。基于这些观察结果,我们提出了一种称为上下文敏感网络的弱监督细粒度时间动作定位方法,旨在充分利用上下文信息。具体而言,我们首先引入一个多尺度上下文提取模块,旨在有效捕获多尺度时间上下文。随后,我们开发了一个尺度敏感上下文门控模块,该模块促进多尺度上下文之间的交互,并根据不同的视频内容自适应地选择信息丰富的上下文。在两个基准数据集FineGym和FineAction上进行的大量实验表明,我们的方法取得了领先的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验