• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MEST:一种具有运动编码器和时空模块的动作识别网络。

MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module.

机构信息

Department of Computer Science, Sichuan University, Chengdu 610017, China.

出版信息

Sensors (Basel). 2022 Sep 1;22(17):6595. doi: 10.3390/s22176595.

DOI:10.3390/s22176595
PMID:36081054
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9460449/
Abstract

As a sub-field of video content analysis, action recognition has received extensive attention in recent years, which aims to recognize human actions in videos. Compared with a single image, video has a temporal dimension. Therefore, it is of great significance to extract the spatio-temporal information from videos for action recognition. In this paper, an efficient network to extract spatio-temporal information with relatively low computational load (dubbed MEST) is proposed. Firstly, a motion encoder to capture short-term motion cues between consecutive frames is developed, followed by a channel-wise spatio-temporal module to model long-term feature information. Moreover, the weight standardization method is applied to the convolution layers followed by batch normalization layers to expedite the training process and facilitate convergence. Experiments are conducted on five public datasets of action recognition, Something-Something-V1 and -V2, Jester, UCF101 and HMDB51, where MEST exhibits competitive performance compared to other popular methods. The results demonstrate the effectiveness of our network in terms of accuracy, computational cost and network scales.

摘要

作为视频内容分析的一个子领域,动作识别近年来受到了广泛关注,其目的是识别视频中的人类动作。与单个图像相比,视频具有时间维度。因此,从视频中提取时空信息对于动作识别具有重要意义。在本文中,提出了一种高效的网络来提取具有较低计算负载的时空信息(称为 MEST)。首先,开发了一个运动编码器,用于捕获连续帧之间的短期运动线索,然后是一个通道式时空模块,用于建模长期特征信息。此外,应用权重标准化方法对卷积层和批量归一化层进行标准化,以加速训练过程并促进收敛。在五个公共动作识别数据集上进行了实验,包括 Something-Something-V1 和 -V2、Jester、UCF101 和 HMDB51,MEST 与其他流行方法相比表现出了竞争性能。结果表明,我们的网络在准确性、计算成本和网络规模方面具有有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/1ea426038f1d/sensors-22-06595-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/8029582924f8/sensors-22-06595-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/5044cfd983c1/sensors-22-06595-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/765821633669/sensors-22-06595-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/292fc7e4201a/sensors-22-06595-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/3e4f94fbd00f/sensors-22-06595-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/072a23981f6e/sensors-22-06595-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/983fd530d5d2/sensors-22-06595-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/d0eca2df43fb/sensors-22-06595-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/fd6176e5a516/sensors-22-06595-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/1ea426038f1d/sensors-22-06595-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/8029582924f8/sensors-22-06595-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/5044cfd983c1/sensors-22-06595-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/765821633669/sensors-22-06595-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/292fc7e4201a/sensors-22-06595-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/3e4f94fbd00f/sensors-22-06595-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/072a23981f6e/sensors-22-06595-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/983fd530d5d2/sensors-22-06595-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/d0eca2df43fb/sensors-22-06595-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/fd6176e5a516/sensors-22-06595-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08b/9460449/1ea426038f1d/sensors-22-06595-g010.jpg

相似文献

1
MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module.MEST:一种具有运动编码器和时空模块的动作识别网络。
Sensors (Basel). 2022 Sep 1;22(17):6595. doi: 10.3390/s22176595.
2
A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention.基于空间注意力的用于动作识别的时空运动网络
Entropy (Basel). 2022 Mar 4;24(3):368. doi: 10.3390/e24030368.
3
A multidimensional feature fusion network based on MGSE and TAAC for video-based human action recognition.一种基于MGSE和TAAC的多维度特征融合网络用于基于视频的人体动作识别。
Neural Netw. 2023 Nov;168:496-507. doi: 10.1016/j.neunet.2023.09.031. Epub 2023 Sep 22.
4
Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition.基于伪 3D 残差网络的两级注意模块的人体动作识别。
Sensors (Basel). 2023 Feb 3;23(3):1707. doi: 10.3390/s23031707.
5
Deep Attention Network for Egocentric Action Recognition.基于深度注意力网络的自我中心动作识别。
IEEE Trans Image Process. 2019 Aug;28(8):3703-3713. doi: 10.1109/TIP.2019.2901707. Epub 2019 Feb 26.
6
AR3D: Attention Residual 3D Network for Human Action Recognition.AR3D:用于人体动作识别的注意力残差 3D 网络。
Sensors (Basel). 2021 Feb 28;21(5):1656. doi: 10.3390/s21051656.
7
BQN: Busy-Quiet Net Enabled by Motion Band-Pass Module for Action Recognition.BQN:基于运动带通模块实现动作识别的忙碌-安静网络
IEEE Trans Image Process. 2022;31:4966-4979. doi: 10.1109/TIP.2022.3189810. Epub 2022 Aug 1.
8
Online action proposal generation using spatio-temporal attention network.基于时空注意力网络的在线动作建议生成。
Neural Netw. 2022 Sep;153:518-529. doi: 10.1016/j.neunet.2022.06.032. Epub 2022 Jun 30.
9
STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video.STA-TSN:用于视频动作识别的时空注意力时间段网络。
PLoS One. 2022 Mar 17;17(3):e0265115. doi: 10.1371/journal.pone.0265115. eCollection 2022.
10
Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach.学习时空表示进行动作识别:一种遗传编程方法。
IEEE Trans Cybern. 2016 Jan;46(1):158-70. doi: 10.1109/TCYB.2015.2399172. Epub 2015 Feb 13.

引用本文的文献

1
Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition.基于伪 3D 残差网络的两级注意模块的人体动作识别。
Sensors (Basel). 2023 Feb 3;23(3):1707. doi: 10.3390/s23031707.
2
WLiT: Windows and Linear Transformer for Video Action Recognition.WLiT:用于视频动作识别的 Windows 和线性变换。
Sensors (Basel). 2023 Feb 2;23(3):1616. doi: 10.3390/s23031616.
3
Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation.基于运动和多视图激励与时间聚合的视频动作识别

本文引用的文献

1
Gaze Estimation Approach Using Deep Differential Residual Network.基于深度差分残差网络的注视估计方法。
Sensors (Basel). 2022 Jul 21;22(14):5462. doi: 10.3390/s22145462.
2
Micro-Expression Recognition Based on Optical Flow and PCANet.基于光流和 PCANet 的微表情识别。
Sensors (Basel). 2022 Jun 5;22(11):4296. doi: 10.3390/s22114296.
3
ASNet: Auto-Augmented Siamese Neural Network for Action Recognition.ASNet:用于动作识别的自动增强型孪生神经网络。
Entropy (Basel). 2022 Nov 15;24(11):1663. doi: 10.3390/e24111663.
Sensors (Basel). 2021 Jul 10;21(14):4720. doi: 10.3390/s21144720.
4
RGB-D Data-Based Action Recognition: A Review.基于 RGB-D 数据的动作识别:综述。
Sensors (Basel). 2021 Jun 21;21(12):4246. doi: 10.3390/s21124246.
5
AR3D: Attention Residual 3D Network for Human Action Recognition.AR3D:用于人体动作识别的注意力残差 3D 网络。
Sensors (Basel). 2021 Feb 28;21(5):1656. doi: 10.3390/s21051656.
6
STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV.STAC:用于第一人称视角(FPV)活动识别的基于补偿信息的时空注意力
Sensors (Basel). 2021 Feb 5;21(4):1106. doi: 10.3390/s21041106.