基于时空注意力网络的在线动作建议生成。

Online action proposal generation using spatio-temporal attention network.

机构信息

Graduate School of Artificial Intelligence, Kyungpook National University, Daegu, 41566, South Korea.

KNU-LG Electronics Convergence Research Center, AI Institute of Technology, Kyungpook National University, Daegu, 41566, South Korea.

出版信息

Neural Netw. 2022 Sep;153:518-529. doi: 10.1016/j.neunet.2022.06.032. Epub 2022 Jun 30.

DOI:10.1016/j.neunet.2022.06.032

PMID:35835013

Abstract

Temporal action proposal generation aims to generate temporal boundaries containing action instances. In real-time applications such as surveillance cameras, autonomous driving, and traffic monitoring, the online localization and recognition of human activities occurring in short temporal intervals are important areas of research. Existing approaches of temporal action proposal generation consider only the offline and frame-level feature aggregation along the temporal dimension. Those offline methods also generate many redundant irrelevant proposal regions in the frames as temporal boundaries. This leads to higher computational cost along with slow processing speed which is not suitable for online tasks. In this study, we propose a novel spatio-temporal attention network for online action proposal generation as opposed to existing offline proposal generation methods. Our novel proposed approach incorporates the inter-dependency between the spatial and temporal context information of each incoming video clip to generate more relevant online temporal action proposals. First, we propose a windowed spatial attention module to capture the inter-spatial relationship between the features of incoming frames. The windowed spatial network produces more robust clip-level feature representation and efficiently deals with noisy features such as occlusion or background scenes. Second, we introduce a temporal attention module to capture relevant temporal dynamic information mutually to the localized spatial information to model the long inter-frame temporal relationship since most online real life videos are untrimmed in nature. By applying these two attention modules sequentially, the novel proposed spatio-temporal network model is able to generate precise action boundaries at a particular instant of time. In addition, the model generates fewer discriminative temporal action proposals while maintaining a low computational cost and high processing speed suitable for online settings.

摘要

时间动作提议生成旨在生成包含动作实例的时间边界。在实时应用中，如监控摄像头、自动驾驶和交通监控，对短时间间隔内发生的人类活动进行在线定位和识别是研究的重要领域。现有的时间动作提议生成方法仅考虑了沿时间维度的离线和帧级特征聚合。这些离线方法也会在帧中生成许多冗余的不相关提议区域作为时间边界。这导致计算成本增加，处理速度较慢，不适合在线任务。在这项研究中，我们提出了一种新颖的时空注意网络，用于在线动作提议生成，而不是现有的离线提议生成方法。我们的新方法结合了每个输入视频片段的空间和时间上下文信息之间的相互依赖关系，以生成更相关的在线时间动作提议。首先，我们提出了一个窗口化的空间注意模块，以捕捉输入帧特征之间的空间关系。窗口化的空间网络生成更稳健的剪辑级特征表示，并有效地处理遮挡或背景场景等噪声特征。其次，我们引入了一个时间注意模块，以捕捉相互的相关时间动态信息，与本地化的空间信息一起建模长的帧间时间关系，因为大多数在线的真实生活视频本质上是未剪辑的。通过顺序应用这两个注意模块，新提出的时空网络模型能够在特定的时间点生成精确的动作边界。此外，该模型生成的判别性时间动作提议更少，同时保持低计算成本和高处理速度，适用于在线设置。

相似文献

Online action proposal generation using spatio-temporal attention network.

Neural Netw. 2022 Sep;153:518-529. doi: 10.1016/j.neunet.2022.06.032. Epub 2022 Jun 30.

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals.

Sensors (Basel). 2019 Mar 3;19(5):1085. doi: 10.3390/s19051085.

MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module.

Sensors (Basel). 2022 Sep 1;22(17):6595. doi: 10.3390/s22176595.

Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition.

Sensors (Basel). 2023 Feb 3;23(3):1707. doi: 10.3390/s23031707.

Robust Online Tracking via Contrastive Spatio-Temporal Aware Network.

IEEE Trans Image Process. 2021;30:1989-2002. doi: 10.1109/TIP.2021.3050314. Epub 2021 Jan 20.

Multi-Level Content-Aware Boundary Detection for Temporal Action Proposal Generation.

IEEE Trans Image Process. 2023;32:6090-6101. doi: 10.1109/TIP.2023.3328471. Epub 2023 Nov 8.

YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks.

IEEE Trans Image Process. 2018 Jun;27(6):2609-2622. doi: 10.1109/TIP.2018.2806279.

Real-Time Video Super-Resolution with Spatio-Temporal Modeling and Redundancy-Aware Inference.

Sensors (Basel). 2023 Sep 14;23(18):7880. doi: 10.3390/s23187880.

STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.

PLoS One. 2022 Mar 17;17(3):e0265115. doi: 10.1371/journal.pone.0265115. eCollection 2022.

Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.

IEEE Trans Image Process. 2018 Mar;27(3):1347-1360. doi: 10.1109/TIP.2017.2778563. Epub 2017 Nov 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于时空注意力网络的在线动作建议生成。

Online action proposal generation using spatio-temporal attention network.

机构信息

Graduate School of Artificial Intelligence, Kyungpook National University, Daegu, 41566, South Korea.

KNU-LG Electronics Convergence Research Center, AI Institute of Technology, Kyungpook National University, Daegu, 41566, South Korea.

出版信息

Neural Netw. 2022 Sep;153:518-529. doi: 10.1016/j.neunet.2022.06.032. Epub 2022 Jun 30.

DOI:10.1016/j.neunet.2022.06.032

PMID:35835013

Abstract

摘要

基于时空注意力网络的在线动作建议生成。

Online action proposal generation using spatio-temporal attention network.

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于时空注意力网络的在线动作建议生成。

Online action proposal generation using spatio-temporal attention network.

机构信息

出版信息

相似文献