• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于时间动作定位的可学习特征增强框架

Learnable Feature Augmentation Framework for Temporal Action Localization.

作者信息

Tang Yepeng, Wang Weining, Zhang Chunjie, Liu Jing, Zhao Yao

出版信息

IEEE Trans Image Process. 2024;33:4002-4015. doi: 10.1109/TIP.2024.3413599. Epub 2024 Jun 28.

DOI:10.1109/TIP.2024.3413599
PMID:38889016
Abstract

Temporal action localization (TAL) has drawn much attention in recent years, however, the performance of previous methods is still far from satisfactory due to the lack of annotated untrimmed video data. To deal with this issue, we propose to improve the utilization of current data through feature augmentation. Given an input video, we first extract video features with pre-trained video encoders, and then randomly mask various semantic contents of video features to consider different views of video features. To avoid damaging important action-related semantic information, we further develop a learnable feature augmentation framework to generate better views of videos. In particular, a Mask-based Feature Augmentation Module (MFAM) is proposed. The MFAM has three advantages: 1) it captures the temporal and semantic relationships of original video features, 2) it generates masked features with indispensable action-related information, and 3) it randomly recycles some masked information to ensure diversity. Finally, we input the masked features and the original features into shared action detectors respectively, and perform action classification and localization jointly for model learning. The proposed framework can improve the robustness and generalization of action detectors by learning more and better views of videos. In the testing stage, the MFAM can be removed, which does not bring extra computational costs. Extensive experiments are conducted on four TAL benchmark datasets. Our proposed framework significantly improves different TAL models and achieves the state-of-the-art performances.

摘要

近年来,时态动作定位(TAL)备受关注,然而,由于缺乏带注释的未修剪视频数据,先前方法的性能仍远不能令人满意。为了解决这个问题,我们建议通过特征增强来提高当前数据的利用率。给定一个输入视频,我们首先使用预训练的视频编码器提取视频特征,然后随机掩盖视频特征的各种语义内容,以考虑视频特征的不同视图。为了避免损坏与动作相关的重要语义信息,我们进一步开发了一个可学习的特征增强框架,以生成更好的视频视图。特别地,提出了一种基于掩码的特征增强模块(MFAM)。MFAM具有三个优点:1)它捕获原始视频特征的时间和语义关系;2)它生成带有不可或缺的与动作相关信息的掩码特征;3)它随机循环一些掩码信息以确保多样性。最后,我们将掩码特征和原始特征分别输入到共享动作检测器中,并联合执行动作分类和定位以进行模型学习。所提出的框架可以通过学习更多更好的视频视图来提高动作检测器的鲁棒性和泛化能力。在测试阶段,可以移除MFAM,这不会带来额外的计算成本。我们在四个TAL基准数据集上进行了广泛的实验。我们提出的框架显著改进了不同的TAL模型,并取得了当前最优的性能。

相似文献

1
Learnable Feature Augmentation Framework for Temporal Action Localization.用于时间动作定位的可学习特征增强框架
IEEE Trans Image Process. 2024;33:4002-4015. doi: 10.1109/TIP.2024.3413599. Epub 2024 Jun 28.
2
Video Person Re-identification by Temporal Residual Learning.基于时间残差学习的视频人物重识别
IEEE Trans Image Process. 2018 Oct 29. doi: 10.1109/TIP.2018.2878505.
3
Semantic and Temporal Contextual Correlation Learning for Weakly-Supervised Temporal Action Localization.用于弱监督时间动作定位的语义和时间上下文关联学习
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12427-12443. doi: 10.1109/TPAMI.2023.3287208. Epub 2023 Sep 5.
4
Graph Convolutional Module for Temporal Action Localization in Videos.用于视频中时间动作定位的图卷积模块
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6209-6223. doi: 10.1109/TPAMI.2021.3090167. Epub 2022 Sep 14.
5
AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization.AdapNet:用于弱监督动作识别和定位的可适应分解编解码器网络。
IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):1852-1863. doi: 10.1109/TNNLS.2019.2962815. Epub 2023 Apr 4.
6
FineAction: A Fine-Grained Video Dataset for Temporal Action Localization.精细动作:用于时间动作定位的细粒度视频数据集
IEEE Trans Image Process. 2022;31:6937-6950. doi: 10.1109/TIP.2022.3217368. Epub 2022 Nov 8.
7
Multimodal and multiscale feature fusion for weakly supervised video anomaly detection.用于弱监督视频异常检测的多模态和多尺度特征融合
Sci Rep. 2024 Oct 1;14(1):22835. doi: 10.1038/s41598-024-73462-0.
8
A Temporal-Aware Relation and Attention Network for Temporal Action Localization.用于时间动作定位的时间感知关系与注意力网络。
IEEE Trans Image Process. 2022;31:4746-4760. doi: 10.1109/TIP.2022.3182866. Epub 2022 Jul 14.
9
StochasticFormer: Stochastic Modeling for Weakly Supervised Temporal Action Localization.随机Former:弱监督时间动作定位的随机建模
IEEE Trans Image Process. 2023;32:1379-1389. doi: 10.1109/TIP.2023.3244411. Epub 2023 Feb 23.
10
Semisupervised feature selection via spline regression for video semantic recognition.基于样条回归的半监督特征选择在视频语义识别中的应用。
IEEE Trans Neural Netw Learn Syst. 2015 Feb;26(2):252-64. doi: 10.1109/TNNLS.2014.2314123.