基于级联双注意力卷积神经网络和双向门控循环单元框架的人类活动识别

Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework.

作者信息

Ullah Hayat, Munir Arslan

机构信息

Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA.

出版信息

J Imaging. 2023 Jun 26;9(7):130. doi: 10.3390/jimaging9070130.

DOI:10.3390/jimaging9070130

PMID:37504807

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10381293/

Abstract

Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial-temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel-spatial attention mechanism to extract human-centric salient features in video frames. The dual channel-spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods.

摘要

基于视觉的人类活动识别（HAR）已成为视频分析中重要的研究领域之一。在过去十年中，人们引入了众多先进的深度学习算法来从视频流中识别复杂的人类动作。这些深度学习算法在视频分析任务中表现出了令人印象深刻的性能。然而，这些新引入的方法要么只专注于模型性能，要么只关注这些模型在计算效率方面的有效性，导致在其提出的处理具有挑战性的HAR问题的方法中，在鲁棒性和计算效率之间存在偏向性的权衡。为了提高准确性和计算效率，本文提出了一种计算高效且通用的时空级联框架，该框架利用深度判别性的空间和时间特征进行HAR。为了有效表示人类动作，我们提出了一种高效的双注意力卷积神经网络（DA-CNN）架构，该架构利用统一的通道-空间注意力机制在视频帧中提取以人类为中心的显著特征。双通道-空间注意力层与卷积层一起学习在特征图中具有对象的空间感受野中更具选择性。然后，将提取的判别性显著特征转发到堆叠的双向门控循环单元（Bi-GRU），用于使用前向和后向传递梯度学习进行人类动作的长期时间建模和识别。我们在三个公开可用的人类动作数据集上进行了广泛的实验，所获得的结果验证了我们提出的框架（DA-CNN+Bi-GRU）在每个数据集的模型准确性和推理运行时方面优于现有方法。实验结果表明，与大多数当代动作识别方法相比，DA-CNN+Bi-GRU框架在每秒帧数方面的执行时间提高了167倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67a6/10381293/7fc3009a6a34/jimaging-09-00130-g001.jpg

相似文献

Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework.

J Imaging. 2023 Jun 26;9(7):130. doi: 10.3390/jimaging9070130.

Utilizing deep learning models in CSI-based human activity recognition.

Neural Comput Appl. 2022;34(8):5993-6010. doi: 10.1007/s00521-021-06787-w. Epub 2022 Jan 7.

A novel convolution bi-directional gated recurrent unit neural network for emotion recognition in multichannel electroencephalogram signals.

Technol Health Care. 2023;31(4):1215-1234. doi: 10.3233/THC-220458.

A hybrid TCN-GRU model for classifying human activities using smartphone inertial signals.

PLoS One. 2024 Aug 13;19(8):e0304655. doi: 10.1371/journal.pone.0304655. eCollection 2024.

STA-CNN: Convolutional Spatial-Temporal Attention Learning for Action Recognition.

IEEE Trans Image Process. 2020 Apr 7. doi: 10.1109/TIP.2020.2984904.

Human Activity Recognition Based on Deep Learning and Micro-Doppler Radar Data.

Sensors (Basel). 2024 Apr 15;24(8):2530. doi: 10.3390/s24082530.

Application of Dual-Channel Convolutional Neural Network Algorithm in Semantic Feature Analysis of English Text Big Data.

Comput Intell Neurosci. 2021 Nov 6;2021:7085412. doi: 10.1155/2021/7085412. eCollection 2021.

Attention-enhanced gated recurrent unit for action recognition in tennis.

PeerJ Comput Sci. 2024 Jan 11;10:e1804. doi: 10.7717/peerj-cs.1804. eCollection 2024.

An emotion recognition method based on EWT-3D-CNN-BiLSTM-GRU-AT model.

Comput Biol Med. 2024 Feb;169:107954. doi: 10.1016/j.compbiomed.2024.107954. Epub 2024 Jan 1.

Sensor-Based Human Activity Recognition with Spatio-Temporal Deep Learning.

Sensors (Basel). 2021 Mar 18;21(6):2141. doi: 10.3390/s21062141.

引用本文的文献

Intelligent recognition of human activities using deep learning techniques.

PLoS One. 2025 Apr 24;20(4):e0321754. doi: 10.1371/journal.pone.0321754. eCollection 2025.

The use of artificial intelligence-based Siamese neural network in personalized guidance for sports dance teaching.

Sci Rep. 2025 Apr 9;15(1):12112. doi: 10.1038/s41598-025-96462-0.

FineTea: A Novel Fine-Grained Action Recognition Video Dataset for Tea Ceremony Actions.

J Imaging. 2024 Aug 31;10(9):216. doi: 10.3390/jimaging10090216.

Human Multi-Activities Classification Using mmWave Radar: Feature Fusion in Time-Domain and PCANet.

Sensors (Basel). 2024 Aug 22;24(16):5450. doi: 10.3390/s24165450.

A 3DCNN-Based Knowledge Distillation Framework for Human Activity Recognition.

J Imaging. 2023 Apr 14;9(4):82. doi: 10.3390/jimaging9040082.

本文引用的文献

Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer.

Sci Rep. 2023 Sep 5;13(1):14624. doi: 10.1038/s41598-023-39744-9.

A Lightweight Image Encryption Algorithm Based on Chaotic Map and Random Substitution.

Entropy (Basel). 2022 Sep 23;24(10):1344. doi: 10.3390/e24101344.

Differentiable Multi-Granularity Human Parsing.

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8296-8310. doi: 10.1109/TPAMI.2023.3239194. Epub 2023 Jun 5.

Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos.

Comput Intell Neurosci. 2022 Apr 4;2022:3454167. doi: 10.1155/2022/3454167. eCollection 2022.

Light-DehazeNet: A Novel Lightweight CNN Architecture for Single Image Dehazing.

IEEE Trans Image Process. 2021;30:8968-8982. doi: 10.1109/TIP.2021.3116790. Epub 2021 Nov 2.

Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos.

IEEE Trans Image Process. 2021;30:4330-4340. doi: 10.1109/TIP.2021.3070732. Epub 2021 Apr 16.

Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval.

IEEE Trans Image Process. 2021;30:2989-3004. doi: 10.1109/TIP.2020.3048680. Epub 2021 Feb 18.

ABSSNet: Attention-Based Spatial Segmentation Network for Traffic Scene Understanding.

IEEE Trans Cybern. 2022 Sep;52(9):9352-9362. doi: 10.1109/TCYB.2021.3050558. Epub 2022 Aug 18.

Global and Local Knowledge-Aware Attention Network for Action Recognition.

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):334-347. doi: 10.1109/TNNLS.2020.2978613. Epub 2021 Jan 4.

Multi-Task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition.

IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2752-2764. doi: 10.1109/TPAMI.2020.2976014. Epub 2021 Jul 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于级联双注意力卷积神经网络和双向门控循环单元框架的人类活动识别

Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献