• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用3D卷积神经网络实现高效时空学习以进行基于姿态的动作识别

Empowering Efficient Spatio-Temporal Learning with a 3D CNN for Pose-Based Action Recognition.

作者信息

Ren Ziliang, Xiao Xiongjiang, Nie Huabei

机构信息

School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China.

School of Artificial Intelligence, Dongguan City University, Dongguan 523419, China.

出版信息

Sensors (Basel). 2024 Nov 30;24(23):7682. doi: 10.3390/s24237682.

DOI:10.3390/s24237682
PMID:39686219
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11645029/
Abstract

Action recognition based on 3D heatmap volumes has received increasing attention recently because it is suitable for application to 3D CNNs to improve the recognition performance of deep networks. However, it is difficult for models to capture global dependencies due to their restricted receptive field. To effectively capture long-range dependencies and balance computations, a novel model, PoseTransformer3D with Global Cross Blocks (GCBs), is proposed for pose-based action recognition. The proposed model extracts spatio-temporal features from processed 3D heatmap volumes. Moreover, we design a further recognition framework, RGB-PoseTransformer3D with Global Cross Complementary Blocks (GCCBs), for multimodality feature learning from both pose and RGB data. To verify the effectiveness of this model, we conducted extensive experiments on four popular video datasets, namely FineGYM, HMDB51, NTU RGB+D 60, and NTU RGB+D 120. Experimental results show that the proposed recognition framework always achieves state-of-the-art recognition performance, substantially improving multimodality learning through action recognition.

摘要

基于3D热图体的动作识别最近受到了越来越多的关注,因为它适用于3D卷积神经网络(3D CNN),以提高深度网络的识别性能。然而,由于模型的感受野有限,它们难以捕捉全局依赖性。为了有效地捕捉长距离依赖性并平衡计算量,我们提出了一种新颖的模型——带有全局交叉块(GCB)的PoseTransformer3D,用于基于姿态的动作识别。所提出的模型从处理后的3D热图体中提取时空特征。此外,我们还设计了一个进一步的识别框架——带有全局交叉互补块(GCCB)的RGB-PoseTransformer3D,用于从姿态和RGB数据中进行多模态特征学习。为了验证该模型的有效性,我们在四个流行的视频数据集上进行了广泛的实验,即FineGYM、HMDB51、NTU RGB+D 60和NTU RGB+D 120。实验结果表明,所提出的识别框架始终能达到当前最优的识别性能,通过动作识别显著改进了多模态学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/0068d070e3cf/sensors-24-07682-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/7adf44d40237/sensors-24-07682-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/69d58ef3b114/sensors-24-07682-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/aec2a37fd7f9/sensors-24-07682-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/050214751cb7/sensors-24-07682-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/25fcdab914e4/sensors-24-07682-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/70c7f206a08c/sensors-24-07682-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/0068d070e3cf/sensors-24-07682-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/7adf44d40237/sensors-24-07682-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/69d58ef3b114/sensors-24-07682-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/aec2a37fd7f9/sensors-24-07682-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/050214751cb7/sensors-24-07682-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/25fcdab914e4/sensors-24-07682-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/70c7f206a08c/sensors-24-07682-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e886/11645029/0068d070e3cf/sensors-24-07682-g007.jpg

相似文献

1
Empowering Efficient Spatio-Temporal Learning with a 3D CNN for Pose-Based Action Recognition.利用3D卷积神经网络实现高效时空学习以进行基于姿态的动作识别
Sensors (Basel). 2024 Nov 30;24(23):7682. doi: 10.3390/s24237682.
2
A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera.基于单目 RGB 相机的联合 3D 姿态估计和动作识别的统一深度框架。
Sensors (Basel). 2020 Mar 25;20(7):1825. doi: 10.3390/s20071825.
3
Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition.用于基于骨架的暴力行为识别的多尺度注意力增强图卷积网络。
Front Neurorobot. 2022 Dec 15;16:1091361. doi: 10.3389/fnbot.2022.1091361. eCollection 2022.
4
A discriminative multi-modal adaptation neural network model for video action recognition.一种用于视频动作识别的判别式多模态自适应神经网络模型。
Neural Netw. 2025 May;185:107114. doi: 10.1016/j.neunet.2024.107114. Epub 2025 Jan 3.
5
Pose-Appearance Relational Modeling for Video Action Recognition.用于视频动作识别的姿势-外观关系建模
IEEE Trans Image Process. 2023;32:295-308. doi: 10.1109/TIP.2022.3228156. Epub 2022 Dec 21.
6
Action-Stage Emphasized Spatio-Temporal VLAD for Video Action Recognition.用于视频动作识别的动作阶段强调时空VLAD
IEEE Trans Image Process. 2019 Jan 3. doi: 10.1109/TIP.2018.2890749.
7
A Survey on 3D Skeleton-Based Action Recognition Using Learning Method.基于学习方法的三维骨骼动作识别研究
Cyborg Bionic Syst. 2024 May 16;5:0100. doi: 10.34133/cbsystems.0100. eCollection 2024.
8
3D network with channel excitation and knowledge distillation for action recognition.用于动作识别的具有通道激励和知识蒸馏的3D网络。
Front Neurorobot. 2023 Mar 23;17:1050167. doi: 10.3389/fnbot.2023.1050167. eCollection 2023.
9
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos.MMNet:一种基于模型的 RGB-D 视频人体动作识别多模态网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3522-3538. doi: 10.1109/TPAMI.2022.3177813. Epub 2023 Feb 3.
10
A Two-Stream Method for Human Action Recognition Using Facial Action Cues.基于面部动作线索的人体动作识别的双流方法。
Sensors (Basel). 2024 Oct 23;24(21):6817. doi: 10.3390/s24216817.

本文引用的文献

1
Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling.基于自适应时空采样的高效受人类视觉启发的动作识别
IEEE Trans Image Process. 2023;32:5245-5256. doi: 10.1109/TIP.2023.3310661. Epub 2023 Sep 20.
2
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos.MMNet:一种基于模型的 RGB-D 视频人体动作识别多模态网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3522-3538. doi: 10.1109/TPAMI.2022.3177813. Epub 2023 Feb 3.
3
Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling.
用于点云视频建模的点时空Transformer网络
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2181-2192. doi: 10.1109/TPAMI.2022.3161735. Epub 2023 Jan 6.
4
Constructing Stronger and Faster Baselines for Skeleton-Based Action Recognition.为基于骨骼的动作识别构建更强更快的基线
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1474-1488. doi: 10.1109/TPAMI.2022.3157033. Epub 2023 Jan 6.
5
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding.NTU RGB+D 120:用于三维人体活动理解的大规模基准测试。
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2684-2701. doi: 10.1109/TPAMI.2019.2916873. Epub 2019 May 14.
6
Temporal Segment Networks for Action Recognition in Videos.用于视频动作识别的时态片段网络
IEEE Trans Pattern Anal Mach Intell. 2019 Nov;41(11):2740-2755. doi: 10.1109/TPAMI.2018.2868668. Epub 2018 Sep 3.
7
Long-Term Temporal Convolutions for Action Recognition.长期时间卷积用于动作识别。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1510-1517. doi: 10.1109/TPAMI.2017.2712608. Epub 2017 Jun 6.