用于动作识别的具有通道激励和知识蒸馏的3D网络。

3D network with channel excitation and knowledge distillation for action recognition.

作者信息

Hu Zhengping, Mao Jianzeng, Yao Jianxin, Bi Shuai

机构信息

School of Information Science and Engineering, Yanshan University, Qinhuangdao, China.

Hebei Key Laboratory of Information Transmission and Signal Processing, Qinhuangdao, China.

出版信息

Front Neurorobot. 2023 Mar 23;17:1050167. doi: 10.3389/fnbot.2023.1050167. eCollection 2023.

DOI:10.3389/fnbot.2023.1050167

PMID:37033413

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10076829/

Abstract

Modern action recognition techniques frequently employ two networks: the spatial stream, which accepts input from RGB frames, and the temporal stream, which accepts input from optical flow. Recent researches use 3D convolutional neural networks that employ spatiotemporal filters on both streams. Although mixing flow with RGB enhances performance, correct optical flow computation is expensive and adds delay to action recognition. In this study, we present a method for training a 3D CNN using RGB frames that replicates the motion stream and, as a result, does not require flow calculation during testing. To begin, in contrast to the SE block, we suggest a channel excitation module (CE module). Experiments have shown that the CE module can improve the feature extraction capabilities of a 3D network and that the effect is superior to the SE block. Second, for action recognition training, we adopt a linear mix of loss based on knowledge distillation and standard cross-entropy loss to effectively leverage appearance and motion information. The Intensified Motion RGB Stream is the stream trained with this combined loss (IMRS). IMRS surpasses RGB or Flow as a single stream; for example, HMDB51 achieves 73.5% accuracy, while RGB and Flow streams score 65.6% and 69.1% accuracy, respectively. Extensive experiments confirm the effectiveness of our proposed method. The comparison with other models proves that our model has good competitiveness in behavior recognition.

摘要

现代动作识别技术经常使用两个网络

空间流，它接受来自RGB帧的输入；时间流，它接受来自光流的输入。最近的研究使用3D卷积神经网络，该网络在两个流上都采用时空滤波器。虽然将光流与RGB混合可以提高性能，但正确的光流计算成本很高，并且会增加动作识别的延迟。在本研究中，我们提出了一种使用RGB帧训练3D CNN的方法，该方法复制了运动流，因此在测试期间不需要进行光流计算。首先，与SE块不同，我们提出了一种通道激励模块（CE模块）。实验表明，CE模块可以提高3D网络的特征提取能力，并且效果优于SE块。其次，对于动作识别训练，我们采用基于知识蒸馏的损失和标准交叉熵损失的线性混合，以有效地利用外观和运动信息。强化运动RGB流是使用这种组合损失（IMRS）训练的流。IMRS作为单个流超过了RGB或光流；例如，HMDB51的准确率达到73.5%，而RGB流和光流的准确率分别为65.6%和69.1%。大量实验证实了我们提出的方法的有效性。与其他模型的比较证明，我们的模型在行为识别方面具有良好的竞争力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bbb/10076829/b64a5b393edb/fnbot-17-1050167-g0001.jpg

相似文献

3D network with channel excitation and knowledge distillation for action recognition.

Front Neurorobot. 2023 Mar 23;17:1050167. doi: 10.3389/fnbot.2023.1050167. eCollection 2023.

Rethinking Motion Representation: Residual Frames With 3D ConvNets.

IEEE Trans Image Process. 2021;30:9231-9244. doi: 10.1109/TIP.2021.3124156. Epub 2021 Nov 10.

Pose-Appearance Relational Modeling for Video Action Recognition.

IEEE Trans Image Process. 2023;32:295-308. doi: 10.1109/TIP.2022.3228156. Epub 2022 Dec 21.

Human action recognition method based on Motion Excitation and Temporal Aggregation module.

Heliyon. 2022 Nov 4;8(11):e11401. doi: 10.1016/j.heliyon.2022.e11401. eCollection 2022 Nov.

Body and Hand-Object ROI-Based Behavior Recognition Using Deep Learning.

Sensors (Basel). 2021 Mar 6;21(5):1838. doi: 10.3390/s21051838.

Action Recognition Using Action Sequences Optimization and Two-Stream 3D Dilated Neural Network.

Comput Intell Neurosci. 2022 Jun 13;2022:6608448. doi: 10.1155/2022/6608448. eCollection 2022.

Two-stream fusion model using 3D-CNN and 2D-CNN via video-frames and optical flow motion templates for hand gesture recognition.

Innov Syst Softw Eng. 2022 Aug 29:1-14. doi: 10.1007/s11334-022-00477-z.

Using Direct Acyclic Graphs to Enhance Skeleton-Based Action Recognition with a Linear-Map Convolution Neural Network.

Sensors (Basel). 2021 Apr 29;21(9):3112. doi: 10.3390/s21093112.

Volleyball Movement Standardization Recognition Model Based on Convolutional Neural Network.

Comput Intell Neurosci. 2023 Jan 25;2023:6116144. doi: 10.1155/2023/6116144. eCollection 2023.

Action Recognition with Dynamic Image Networks.

IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):2799-2813. doi: 10.1109/TPAMI.2017.2769085. Epub 2017 Nov 2.

引用本文的文献

TL-CStrans Net: a vision robot for table tennis player action recognition driven via CS-Transformer.

Front Neurorobot. 2024 Oct 21;18:1443177. doi: 10.3389/fnbot.2024.1443177. eCollection 2024.

DILS: depth incremental learning strategy.

Front Neurorobot. 2024 Jan 8;17:1337130. doi: 10.3389/fnbot.2023.1337130. eCollection 2023.

本文引用的文献

Forecasting Action Through Contact Representations From First Person Video.

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6703-6714. doi: 10.1109/TPAMI.2021.3055233. Epub 2023 May 5.

Squeeze-and-Excitation Networks.

IEEE Trans Pattern Anal Mach Intell. 2020 Aug;42(8):2011-2023. doi: 10.1109/TPAMI.2019.2913372. Epub 2019 Apr 29.

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691. doi: 10.1109/TPAMI.2016.2599174. Epub 2016 Sep 1.

3D convolutional neural networks for human action recognition.

IEEE Trans Pattern Anal Mach Intell. 2013 Jan;35(1):221-31. doi: 10.1109/TPAMI.2012.59.

A duality based algorithm for TV-L1-optical-flow image registration.

Med Image Comput Comput Assist Interv. 2007;10(Pt 2):511-8. doi: 10.1007/978-3-540-75759-7_62.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于动作识别的具有通道激励和知识蒸馏的3D网络。

3D network with channel excitation and knowledge distillation for action recognition.

作者信息

机构信息

出版信息

现代动作识别技术经常使用两个网络

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献