• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多模态特征和区域建议的非裁剪视频时空动作检测。

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals.

机构信息

Department of Computer Science, Graduate School, Kyonggi University, 154-42 Gwanggyosan-ro Yeongtong-gu, Suwon-si 16227, Korea.

Department of Computer Science, Kyonggi University, 154-42 Gwanggyosan-ro Yeongtong-gu, Suwon-si 16227, Korea.

出版信息

Sensors (Basel). 2019 Mar 3;19(5):1085. doi: 10.3390/s19051085.

DOI:10.3390/s19051085
PMID:30832433
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6427216/
Abstract

This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model.

摘要

本文提出了一种新的深度神经网络模型,用于解决时空动作检测问题,通过在未修剪的视频中定位所有多动作区域并对相应的动作进行分类。所提出的模型使用时空区域提议方法来有效地检测多动作区域。首先,在时间区域提议中,通过针对可能包含动作的预期区域生成锚框。与传统的时间区域提议方法不同,所提出的方法使用互补的两阶段方法来有效地检测异步发生的各个动作的时间区域。此外,为了在视频中出现的人中检测执行动作的主要代理,使用了空间区域提议过程。进一步地,粗粒度特征包含整个视频的综合信息,并且已经在传统的动作检测研究中经常使用。然而,它们不能提供视频中执行动作的每个人的详细信息。为了克服粗粒度特征的局限性,所提出的模型还从视频中的提议动作管中学习细粒度特征。使用 LIRIS-HARL 和 UCF-10 数据集进行的各种实验证实了所提出的深度神经网络模型的高性能和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/986c3592db08/sensors-19-01085-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/aeb31034b140/sensors-19-01085-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/67961ea2cf2a/sensors-19-01085-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/b707e095ef2a/sensors-19-01085-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/4f9874d57e6e/sensors-19-01085-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/31deaf0e50f7/sensors-19-01085-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/fd52a06c98a2/sensors-19-01085-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/c7d9b489c41c/sensors-19-01085-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/986c3592db08/sensors-19-01085-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/aeb31034b140/sensors-19-01085-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/67961ea2cf2a/sensors-19-01085-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/b707e095ef2a/sensors-19-01085-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/4f9874d57e6e/sensors-19-01085-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/31deaf0e50f7/sensors-19-01085-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/fd52a06c98a2/sensors-19-01085-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/c7d9b489c41c/sensors-19-01085-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b8f/6427216/986c3592db08/sensors-19-01085-g008.jpg

相似文献

1
Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals.基于多模态特征和区域建议的非裁剪视频时空动作检测。
Sensors (Basel). 2019 Mar 3;19(5):1085. doi: 10.3390/s19051085.
2
YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks.YoTube:通过递归和静态回归网络进行搜索动作提案。
IEEE Trans Image Process. 2018 Jun;27(6):2609-2622. doi: 10.1109/TIP.2018.2806279.
3
Online action proposal generation using spatio-temporal attention network.基于时空注意力网络的在线动作建议生成。
Neural Netw. 2022 Sep;153:518-529. doi: 10.1016/j.neunet.2022.06.032. Epub 2022 Jun 30.
4
Deep Learning-Based Action Detection in Untrimmed Videos: A Survey.基于深度学习的未修剪视频动作检测:一项综述。
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4302-4320. doi: 10.1109/TPAMI.2022.3193611. Epub 2023 Mar 7.
5
RecapNet: Action Proposal Generation Mimicking Human Cognitive Process.RecapNet:模仿人类认知过程的动作提案生成。
IEEE Trans Cybern. 2021 Dec;51(12):6017-6028. doi: 10.1109/TCYB.2020.2965196. Epub 2021 Dec 22.
6
AMS-Net: Modeling Adaptive Multi-Granularity Spatio-Temporal Cues for Video Action Recognition.AMS-Net:用于视频动作识别的自适应多粒度时空线索建模
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18731-18745. doi: 10.1109/TNNLS.2023.3321141. Epub 2024 Dec 2.
7
Confidence-Guided Self Refinement for Action Prediction in Untrimmed Videos.用于未修剪视频动作预测的置信度引导自精炼
IEEE Trans Image Process. 2020 Apr 17. doi: 10.1109/TIP.2020.2987425.
8
Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation.用于视频场景图生成的轨迹对提议和上下文推理。
Sensors (Basel). 2021 May 2;21(9):3164. doi: 10.3390/s21093164.
9
Unsupervised Action Proposals Using Support Vector Classifiers for Online Video Processing.基于支持向量分类器的无监督动作建议在在线视频处理中的应用。
Sensors (Basel). 2020 May 22;20(10):2953. doi: 10.3390/s20102953.
10
Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization.用于动作定位的时空域渐进式跨流合作
IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4477-4490. doi: 10.1109/TPAMI.2020.2997860. Epub 2021 Nov 3.

引用本文的文献

1
MCMNET: Multi-Scale Context Modeling Network for Temporal Action Detection.MCMNET:用于时态动作检测的多尺度上下文建模网络。
Sensors (Basel). 2023 Aug 31;23(17):7563. doi: 10.3390/s23177563.
2
Deep Learning-Based Real-Time Multiple-Person Action Recognition System.基于深度学习的实时多人动作识别系统。
Sensors (Basel). 2020 Aug 23;20(17):4758. doi: 10.3390/s20174758.

本文引用的文献

1
Human Pose Estimation from Monocular Images: A Comprehensive Survey.单目图像人体姿态估计:全面综述
Sensors (Basel). 2016 Nov 25;16(12):1966. doi: 10.3390/s16121966.
2
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.
3
3D convolutional neural networks for human action recognition.三维卷积神经网络的人体动作识别。
IEEE Trans Pattern Anal Mach Intell. 2013 Jan;35(1):221-31. doi: 10.1109/TPAMI.2012.59.