Suppr超能文献

用于动作识别的具有通道激励和知识蒸馏的3D网络。

3D network with channel excitation and knowledge distillation for action recognition.

作者信息

Hu Zhengping, Mao Jianzeng, Yao Jianxin, Bi Shuai

机构信息

School of Information Science and Engineering, Yanshan University, Qinhuangdao, China.

Hebei Key Laboratory of Information Transmission and Signal Processing, Qinhuangdao, China.

出版信息

Front Neurorobot. 2023 Mar 23;17:1050167. doi: 10.3389/fnbot.2023.1050167. eCollection 2023.

Abstract

Modern action recognition techniques frequently employ two networks: the spatial stream, which accepts input from RGB frames, and the temporal stream, which accepts input from optical flow. Recent researches use 3D convolutional neural networks that employ spatiotemporal filters on both streams. Although mixing flow with RGB enhances performance, correct optical flow computation is expensive and adds delay to action recognition. In this study, we present a method for training a 3D CNN using RGB frames that replicates the motion stream and, as a result, does not require flow calculation during testing. To begin, in contrast to the SE block, we suggest a channel excitation module (CE module). Experiments have shown that the CE module can improve the feature extraction capabilities of a 3D network and that the effect is superior to the SE block. Second, for action recognition training, we adopt a linear mix of loss based on knowledge distillation and standard cross-entropy loss to effectively leverage appearance and motion information. The Intensified Motion RGB Stream is the stream trained with this combined loss (IMRS). IMRS surpasses RGB or Flow as a single stream; for example, HMDB51 achieves 73.5% accuracy, while RGB and Flow streams score 65.6% and 69.1% accuracy, respectively. Extensive experiments confirm the effectiveness of our proposed method. The comparison with other models proves that our model has good competitiveness in behavior recognition.

摘要

现代动作识别技术经常使用两个网络

空间流,它接受来自RGB帧的输入;时间流,它接受来自光流的输入。最近的研究使用3D卷积神经网络,该网络在两个流上都采用时空滤波器。虽然将光流与RGB混合可以提高性能,但正确的光流计算成本很高,并且会增加动作识别的延迟。在本研究中,我们提出了一种使用RGB帧训练3D CNN的方法,该方法复制了运动流,因此在测试期间不需要进行光流计算。首先,与SE块不同,我们提出了一种通道激励模块(CE模块)。实验表明,CE模块可以提高3D网络的特征提取能力,并且效果优于SE块。其次,对于动作识别训练,我们采用基于知识蒸馏的损失和标准交叉熵损失的线性混合,以有效地利用外观和运动信息。强化运动RGB流是使用这种组合损失(IMRS)训练的流。IMRS作为单个流超过了RGB或光流;例如,HMDB51的准确率达到73.5%,而RGB流和光流的准确率分别为65.6%和69.1%。大量实验证实了我们提出的方法的有效性。与其他模型的比较证明,我们的模型在行为识别方面具有良好的竞争力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bbb/10076829/b64a5b393edb/fnbot-17-1050167-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验