• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TN-ZSTAD:用于零样本时间活动检测的可迁移网络。

TN-ZSTAD: Transferable Network for Zero-Shot Temporal Activity Detection.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3848-3861. doi: 10.1109/TPAMI.2022.3183586. Epub 2023 Feb 3.

DOI:10.1109/TPAMI.2022.3183586
PMID:35709117
Abstract

An integral part of video analysis and surveillance is temporal activity detection, which means to simultaneously recognize and localize activities in long untrimmed videos. Currently, the most effective methods of temporal activity detection are based on deep learning, and they typically perform very well with large scale annotated videos for training. However, these methods are limited in real applications due to the unavailable videos about certain activity classes and the time-consuming data annotation. To solve this challenging problem, we propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training still need to be detected. We design an end-to-end deep transferable network TN-ZSTAD as the architecture for this solution. On the one hand, this network utilizes an activity graph transformer to predict a set of activity instances that appear in the video, rather than produces many activity proposals in advance. On the other hand, this network captures the common semantics of seen and unseen activities from their corresponding label embeddings, and it is optimized with an innovative loss function that considers the classification property on seen activities and the transfer property on unseen activities together. Experiments on the THUMOS'14, Charades, and ActivityNet datasets show promising performance in terms of detecting unseen activities.

摘要

视频分析和监控的一个重要组成部分是时间活动检测,这意味着要在未经剪辑的长视频中同时识别和定位活动。目前,时间活动检测最有效的方法基于深度学习,并且它们通常在用于训练的大规模标注视频上表现得非常好。然而,由于某些活动类别的视频不可用以及数据标注的耗时,这些方法在实际应用中受到限制。为了解决这个具有挑战性的问题,我们提出了一种名为零样本时间活动检测(ZSTAD)的新任务设置,其中仍然需要检测在训练中从未见过的活动。我们设计了一个端到端的深度可迁移网络 TN-ZSTAD 作为该解决方案的架构。一方面,该网络利用活动图转换器来预测出现在视频中的一组活动实例,而不是预先生成许多活动提议。另一方面,该网络从相应的标签嵌入中捕获可见和不可见活动的共同语义,并使用一种创新的损失函数进行优化,该函数同时考虑了可见活动的分类属性和不可见活动的迁移属性。在 THUMOS'14、Charades 和 ActivityNet 数据集上的实验表明,在检测不可见活动方面具有有前景的性能。

相似文献

1
TN-ZSTAD: Transferable Network for Zero-Shot Temporal Activity Detection.TN-ZSTAD:用于零样本时间活动检测的可迁移网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3848-3861. doi: 10.1109/TPAMI.2022.3183586. Epub 2023 Feb 3.
2
Two-Stream Region Convolutional 3D Network for Temporal Activity Detection.双流区域卷积 3D 网络用于时间活动检测。
IEEE Trans Pattern Anal Mach Intell. 2019 Oct;41(10):2319-2332. doi: 10.1109/TPAMI.2019.2921539. Epub 2019 Jun 7.
3
Unsupervised Action Proposals Using Support Vector Classifiers for Online Video Processing.基于支持向量分类器的无监督动作建议在在线视频处理中的应用。
Sensors (Basel). 2020 May 22;20(10):2953. doi: 10.3390/s20102953.
4
Transformer-Based Approach Via Contrastive Learning for Zero-Shot Detection.基于对比学习的零样本检测的Transformer 方法。
Int J Neural Syst. 2023 Jul;33(7):2350035. doi: 10.1142/S0129065723500351. Epub 2023 Jun 14.
5
Multi-label zero-shot learning with graph convolutional networks.基于图卷积网络的多标签零样本学习。
Neural Netw. 2020 Dec;132:333-341. doi: 10.1016/j.neunet.2020.09.010. Epub 2020 Sep 21.
6
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.用于组合式时间定位的变分跨图推理与自适应结构化语义学习
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12601-12617. doi: 10.1109/TPAMI.2023.3274139. Epub 2023 Sep 5.
7
Deep Learning-Based Action Detection in Untrimmed Videos: A Survey.基于深度学习的未修剪视频动作检测:一项综述。
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4302-4320. doi: 10.1109/TPAMI.2022.3193611. Epub 2023 Mar 7.
8
Structured Label Inference for Visual Understanding.视觉理解的结构化标签推理。
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1257-1271. doi: 10.1109/TPAMI.2019.2893215. Epub 2019 Jan 16.
9
Deep Motion Prior for Weakly-Supervised Temporal Action Localization.用于弱监督时间动作定位的深度运动先验
IEEE Trans Image Process. 2022;31:5203-5213. doi: 10.1109/TIP.2022.3193752. Epub 2022 Aug 4.
10
Video Salient Object Detection via Fully Convolutional Networks.基于全卷积网络的视频显著目标检测
IEEE Trans Image Process. 2018;27(1):38-49. doi: 10.1109/TIP.2017.2754941.

引用本文的文献

1
Denoising Vanilla Autoencoder for RGB and GS Images with Gaussian Noise.用于带高斯噪声的RGB和灰度图像的去噪香草自动编码器
Entropy (Basel). 2023 Oct 20;25(10):1467. doi: 10.3390/e25101467.
2
Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer.基于粒子群优化卷积神经网络变压器的带动力学的视频动作识别协同学习
Sci Rep. 2023 Sep 5;13(1):14624. doi: 10.1038/s41598-023-39744-9.
3
An Underwater Image Enhancement Method for a Preprocessing Framework Based on Generative Adversarial Network.基于生成对抗网络的预处理框架水下图像增强方法。
Sensors (Basel). 2023 Jun 21;23(13):5774. doi: 10.3390/s23135774.
4
A Novel Unsupervised Video Anomaly Detection Framework Based on Optical Flow Reconstruction and Erased Frame Prediction.基于光流重构和遮挡帧预测的新型无监督视频异常检测框架
Sensors (Basel). 2023 May 17;23(10):4828. doi: 10.3390/s23104828.
5
Detection and Classification of Histopathological Breast Images Using a Fusion of CNN Frameworks.基于卷积神经网络框架融合的乳腺组织病理图像检测与分类
Diagnostics (Basel). 2023 May 11;13(10):1700. doi: 10.3390/diagnostics13101700.
6
Dynamic Path Planning of AGV Based on Kinematical Constraint A* Algorithm and Following DWA Fusion Algorithms.基于运动学约束 A*算法和跟随 DWA 融合算法的 AGV 动态路径规划。
Sensors (Basel). 2023 Apr 19;23(8):4102. doi: 10.3390/s23084102.