• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度学习的实时多时空动作定位与预测方法。

Real-time multiple spatiotemporal action localization and prediction approach using deep learning.

机构信息

Faculty of Computers and Artificial Intelligence, Cairo University, Egypt; Member of Scientific Research Group in Egypt (SRGE), Egypt.

Faculty of Computers and Artificial Intelligence, Cairo University, Egypt; Member of Scientific Research Group in Egypt (SRGE), Egypt.

出版信息

Neural Netw. 2020 Aug;128:331-344. doi: 10.1016/j.neunet.2020.05.017. Epub 2020 May 19.

DOI:10.1016/j.neunet.2020.05.017
PMID:32470798
Abstract

Detecting the locations of multiple actions in videos and classifying them in real-time are challenging problems termed "action localization and prediction" problem. Convolutional neural networks (ConvNets) have achieved great success for action localization and prediction in still images. A major advance occurred when the AlexNet architecture was introduced in the ImageNet competition. ConvNets have since achieved state-of-the-art performances across a wide variety of machine vision tasks, including object detection, image segmentation, image classification, facial recognition, human pose estimation, and tracking. However, few works exist that address action localization and prediction in videos. The current action localization research primarily focuses on the classification of temporally trimmed videos in which only one action occurs per frame. Moreover, nearly all the current approaches work only offline and are too slow to be useful in real-world environments. In this work, we propose a fast and accurate deep-learning approach to perform real-time action localization and prediction. The proposed approach uses convolutional neural networks to localize multiple actions and predict their classes in real time. This approach starts by using appearance and motion detection networks (known as "you only look once" (YOLO) networks) to localize and classify actions from RGB frames and optical flow frames using a two-stream model. We then propose a fusion step that increases the localization accuracy of the proposed approach. Moreover, we generate an action tube based on frame level detection. The frame by frame processing introduces an early action detection and prediction with top performance in terms of detection speed and precision. The experimental results demonstrate this superiority of our proposed approach in terms of both processing time and accuracy compared to recent offline and online action localization and prediction approaches on the challenging UCF-101-24 and J-HMDB-21 benchmarks.

摘要

在视频中检测多个动作的位置并实时对其进行分类是一个具有挑战性的问题,称为“动作定位与预测”问题。卷积神经网络(ConvNets)在静态图像中的动作定位和预测方面取得了巨大的成功。当 AlexNet 架构在 ImageNet 竞赛中推出时,取得了重大进展。此后,ConvNets 在各种机器视觉任务中实现了最先进的性能,包括目标检测、图像分割、图像分类、人脸识别、人体姿态估计和跟踪。然而,很少有作品涉及视频中的动作定位和预测。当前的动作定位研究主要集中在对每帧只发生一个动作的时间修剪视频的分类上。此外,几乎所有当前的方法都仅在离线环境中工作,在实际环境中速度太慢,无法使用。在这项工作中,我们提出了一种快速准确的深度学习方法来实时执行动作定位和预测。所提出的方法使用卷积神经网络实时定位多个动作并预测它们的类别。该方法首先使用外观和运动检测网络(称为“仅看一次”(YOLO)网络)使用双流模型从 RGB 帧和光流帧中定位和分类动作。然后,我们提出了一种融合步骤,以提高所提出方法的定位精度。此外,我们基于帧级检测生成动作管。逐帧处理引入了早期动作检测和预测,在检测速度和精度方面具有最高性能。实验结果表明,与 UCF-101-24 和 J-HMDB-21 基准上最近的离线和在线动作定位和预测方法相比,我们提出的方法在处理时间和准确性方面具有优越性。

相似文献

1
Real-time multiple spatiotemporal action localization and prediction approach using deep learning.基于深度学习的实时多时空动作定位与预测方法。
Neural Netw. 2020 Aug;128:331-344. doi: 10.1016/j.neunet.2020.05.017. Epub 2020 May 19.
2
Online Localization and Prediction of Actions and Interactions.动作与交互的在线定位及预测
IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):459-472. doi: 10.1109/TPAMI.2018.2797266. Epub 2018 Jan 23.
3
Deep Manifold Learning Combined With Convolutional Neural Networks for Action Recognition.基于深度流形学习与卷积神经网络的动作识别。
IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):3938-3952. doi: 10.1109/TNNLS.2017.2740318. Epub 2017 Sep 15.
4
A fully integrated computer-aided diagnosis system for digital X-ray mammograms via deep learning detection, segmentation, and classification.基于深度学习检测、分割和分类的全集成数字 X 射线乳腺计算机辅助诊断系统。
Int J Med Inform. 2018 Sep;117:44-54. doi: 10.1016/j.ijmedinf.2018.06.003. Epub 2018 Jun 18.
5
Automated Video Behavior Recognition of Pigs Using Two-Stream Convolutional Networks.使用双流卷积网络的猪自动视频行为识别。
Sensors (Basel). 2020 Feb 17;20(4):1085. doi: 10.3390/s20041085.
6
Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models.基于深度学习模型的航空捕获视频序列中的实时人体检测。
Comput Intell Neurosci. 2018 Feb 12;2018:1639561. doi: 10.1155/2018/1639561. eCollection 2018.
7
Analysis of Movement and Activities of Handball Players Using Deep Neural Networks.使用深度神经网络对手球运动员的运动和活动进行分析。
J Imaging. 2023 Apr 13;9(4):80. doi: 10.3390/jimaging9040080.
8
Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection.使用用于区域提议和检测的深度神经网络在机器人辅助手术视频中检测和定位机器人工具。
IEEE Trans Med Imaging. 2017 Jul;36(7):1542-1549. doi: 10.1109/TMI.2017.2665671. Epub 2017 Feb 8.
9
Training-Based Methods for Comparison of Object Detection Methods for Visual Object Tracking.基于训练的方法用于视觉目标跟踪中目标检测方法的比较。
Sensors (Basel). 2018 Nov 16;18(11):3994. doi: 10.3390/s18113994.
10
A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition.基于深度学习的手检测与手势识别端到端复合系统。
Sensors (Basel). 2019 Nov 30;19(23):5282. doi: 10.3390/s19235282.