• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高效的人类实例引导视频动作识别框架

An Efficient Human Instance-Guided Framework for Video Action Recognition.

机构信息

Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea.

Clova AI Research, NAVER Corporation, Seongnam 13561, Korea.

出版信息

Sensors (Basel). 2021 Dec 12;21(24):8309. doi: 10.3390/s21248309.

DOI:10.3390/s21248309
PMID:34960404
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8709376/
Abstract

In recent years, human action recognition has been studied by many computer vision researchers. Recent studies have attempted to use two-stream networks using appearance and motion features, but most of these approaches focused on clip-level video action recognition. In contrast to traditional methods which generally used entire images, we propose a new human instance-level video action recognition framework. In this framework, we represent the instance-level features using human boxes and keypoints, and our action region features are used as the inputs of the temporal action head network, which makes our framework more discriminative. We also propose novel temporal action head networks consisting of various modules, which reflect various temporal dynamics well. In the experiment, the proposed models achieve comparable performance with the state-of-the-art approaches on two challenging datasets. Furthermore, we evaluate the proposed features and networks to verify the effectiveness of them. Finally, we analyze the confusion matrix and visualize the recognized actions at human instance level when there are several people.

摘要

近年来,许多计算机视觉研究人员研究了人类动作识别。最近的研究尝试使用使用外观和运动特征的双流网络,但这些方法大多集中在剪辑级别的视频动作识别上。与传统方法通常使用整个图像不同,我们提出了一种新的人类实例级视频动作识别框架。在这个框架中,我们使用人体框和关键点来表示实例级特征,我们的动作区域特征作为时间动作头网络的输入,这使得我们的框架更具辨别力。我们还提出了新的时间动作头网络,其中包含各种模块,这些模块很好地反映了各种时间动态。在实验中,所提出的模型在两个具有挑战性的数据集上与最先进的方法相比取得了相当的性能。此外,我们评估了所提出的特征和网络,以验证它们的有效性。最后,我们分析混淆矩阵并可视化当有几个人时在人类实例级别识别的动作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/9ffb39607068/sensors-21-08309-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/bde6d8d825ee/sensors-21-08309-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/05c6fd3e059e/sensors-21-08309-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/ce9ea7bf48dc/sensors-21-08309-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/f2380be8bf45/sensors-21-08309-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/b888b4fbe14f/sensors-21-08309-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/9ffb39607068/sensors-21-08309-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/bde6d8d825ee/sensors-21-08309-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/05c6fd3e059e/sensors-21-08309-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/ce9ea7bf48dc/sensors-21-08309-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/f2380be8bf45/sensors-21-08309-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/b888b4fbe14f/sensors-21-08309-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cf1/8709376/9ffb39607068/sensors-21-08309-g006.jpg

相似文献

1
An Efficient Human Instance-Guided Framework for Video Action Recognition.高效的人类实例引导视频动作识别框架
Sensors (Basel). 2021 Dec 12;21(24):8309. doi: 10.3390/s21248309.
2
Energy-Guided Temporal Segmentation Network for Multimodal Human Action Recognition.基于能量引导的多模态人体动作识别时间分割网络。
Sensors (Basel). 2020 Aug 19;20(17):4673. doi: 10.3390/s20174673.
3
A Comprehensive Review of Recent Deep Learning Techniques for Human Activity Recognition.深度学习技术在人体活动识别中的研究进展综述
Comput Intell Neurosci. 2022 Apr 20;2022:8323962. doi: 10.1155/2022/8323962. eCollection 2022.
4
A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset.用于小规模深度视频数据集动作识别的深度序列学习框架。
Sensors (Basel). 2022 Sep 9;22(18):6841. doi: 10.3390/s22186841.
5
Adaptive Attention Memory Graph Convolutional Networks for Skeleton-Based Action Recognition.用于基于骨架的动作识别的自适应注意力记忆图卷积网络
Sensors (Basel). 2021 Oct 12;21(20):6761. doi: 10.3390/s21206761.
6
Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition.基于伪 3D 残差网络的两级注意模块的人体动作识别。
Sensors (Basel). 2023 Feb 3;23(3):1707. doi: 10.3390/s23031707.
7
Skeleton-Based Action Recognition Based on Distance Vector and Multihigh View Adaptive Networks.基于距离向量和多高视自适应网络的骨架动作识别。
Comput Intell Neurosci. 2021 Aug 18;2021:1507770. doi: 10.1155/2021/1507770. eCollection 2021.
8
Deep Attention Network for Egocentric Action Recognition.基于深度注意力网络的自我中心动作识别。
IEEE Trans Image Process. 2019 Aug;28(8):3703-3713. doi: 10.1109/TIP.2019.2901707. Epub 2019 Feb 26.
9
Attention-Based Temporal Encoding Network with Background-Independent Motion Mask for Action Recognition.基于注意力的具有背景无关运动掩码的时间编码网络用于动作识别。
Comput Intell Neurosci. 2021 Mar 27;2021:8890808. doi: 10.1155/2021/8890808. eCollection 2021.
10
Lightweight Semantic-Guided Neural Networks Based on Single Head Attention for Action Recognition.基于单头注意力的轻量级语义引导神经网络在动作识别中的应用。
Sensors (Basel). 2022 Nov 28;22(23):9249. doi: 10.3390/s22239249.

引用本文的文献

1
Wearable Sensors and Artificial Intelligence for the Diagnosis of Parkinson's Disease.用于帕金森病诊断的可穿戴传感器与人工智能
J Clin Med. 2025 Jun 13;14(12):4207. doi: 10.3390/jcm14124207.
2
Artificial Intelligence in Endoscopic Ultrasonography-Guided Fine-Needle Aspiration/Biopsy (EUS-FNA/B) for Solid Pancreatic Lesions: Opportunities and Challenges.人工智能在内镜超声引导下对实性胰腺病变进行细针穿刺抽吸/活检(EUS-FNA/B):机遇与挑战
Diagnostics (Basel). 2023 Sep 26;13(19):3054. doi: 10.3390/diagnostics13193054.

本文引用的文献

1
Action Recognition Using Close-Up of Maximum Activation and ETRI-Activity3D LivingLab Dataset.利用最大激活特写和 ETRI-Activity3D LivingLab 数据集进行动作识别。
Sensors (Basel). 2021 Oct 12;21(20):6774. doi: 10.3390/s21206774.
2
A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities.一种用于从 RGB 和深度模态进行大规模动作识别的混合网络。
Sensors (Basel). 2020 Jun 10;20(11):3305. doi: 10.3390/s20113305.
3
Action-Stage Emphasized Spatio-Temporal VLAD for Video Action Recognition.用于视频动作识别的动作阶段强调时空VLAD
IEEE Trans Image Process. 2019 Jan 3. doi: 10.1109/TIP.2018.2890749.
4
Deep CNN-Based Blind Image Quality Predictor.基于深度卷积神经网络的盲图像质量预测器。
IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):11-24. doi: 10.1109/TNNLS.2018.2829819. Epub 2018 Jun 12.
5
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.长期递归卷积网络的视觉识别与描述。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691. doi: 10.1109/TPAMI.2016.2599174. Epub 2016 Sep 1.
6
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.
7
Cross-View Action Recognition via Transferable Dictionary Learning.跨视图动作识别的可迁移字典学习
IEEE Trans Image Process. 2016 May;25(6):2542-56. doi: 10.1109/TIP.2016.2548242.
8
Foveated video compression with optimal rate control.具有最佳率控制的注视点视频压缩。
IEEE Trans Image Process. 2001;10(7):977-92. doi: 10.1109/83.931092.