Suppr超能文献

用于时间动作定位的时间感知关系与注意力网络。

A Temporal-Aware Relation and Attention Network for Temporal Action Localization.

作者信息

Zhao Yibo, Zhang Hua, Gao Zan, Guan Weili, Nie Jie, Liu Anan, Wang Meng, Chen Shengyong

出版信息

IEEE Trans Image Process. 2022;31:4746-4760. doi: 10.1109/TIP.2022.3182866. Epub 2022 Jul 14.

Abstract

Temporal action localization is currently an active research topic in computer vision and machine learning due to its usage in smart surveillance. It is a challenging problem since the categories of the actions must be classified in untrimmed videos and the start and end of the actions need to be accurately found. Although many temporal action localization methods have been proposed, they require substantial amounts of computational resources for the training and inference processes. To solve these issues, in this work, a novel temporal-aware relation and attention network (abbreviated as TRA) is proposed for the temporal action localization task. TRA has an anchor-free and end-to-end architecture that fully uses temporal-aware information. Specifically, a temporal self-attention module is first designed to determine the relationship between different temporal positions, and more weight is given to features within the actions. Then, a multiple temporal aggregation module is constructed to aggregate the temporal domain information. Finally, a graph relation module is designed to obtain the aggregated graph features, which are used to refine the boundaries and classification results. Most importantly, these three modules are jointly explored in a unified framework, and temporal awareness is always fully used. Extensive experiments demonstrate that the proposed method can outperform all state-of-the-art methods on the THUMOS14 dataset with an average mAP that reaches 67.6% and obtain a comparable result on the ActivityNet1.3 dataset with an average mAP that reaches 34.4%. Compared with A2Net (TIP20), PCG-TAL (TIP21), and AFSD (CVPR21) TRA can achieve improvements of 11.7%, 4.4%, and 1.8%, respectively on the THUMOS14 dataset.

摘要

由于在智能监控中的应用,时态动作定位目前是计算机视觉和机器学习领域的一个活跃研究课题。这是一个具有挑战性的问题,因为必须在未剪辑的视频中对动作类别进行分类,并且需要准确找到动作的开始和结束。尽管已经提出了许多时态动作定位方法,但它们在训练和推理过程中需要大量的计算资源。为了解决这些问题,在这项工作中,我们提出了一种新颖的时态感知关系与注意力网络(简称为TRA)用于时态动作定位任务。TRA具有无锚点且端到端的架构,充分利用了时态感知信息。具体来说,首先设计了一个时态自注意力模块来确定不同时态位置之间的关系,并对动作内的特征赋予更大权重。然后,构建了一个多时态聚合模块来聚合时域信息。最后,设计了一个图关系模块来获取聚合后的图特征,用于细化边界和分类结果。最重要的是,这三个模块在一个统一的框架中联合探索,并且始终充分利用时态感知。大量实验表明,所提出的方法在THUMOS14数据集上能够超越所有现有最先进方法,平均mAP达到67.6%,并且在ActivityNet1.3数据集上获得了可比的结果,平均mAP达到34.4%。与A2Net(TIP20)、PCG-TAL(TIP21)和AFSD(CVPR21)相比,TRA在THUMOS14数据集上分别可以实现11.7%、4.4%和1.8%的提升。

相似文献

3
Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks.基于对比评估网络的弱监督时间动作定位
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5886-5902. doi: 10.1109/TPAMI.2021.3078798. Epub 2022 Aug 4.
8
End-to-End Temporal Action Detection With Transformer.基于Transformer的端到端时域动作检测
IEEE Trans Image Process. 2022;31:5427-5441. doi: 10.1109/TIP.2022.3195321. Epub 2022 Aug 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验