• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于多模态手势识别的多尺度注意力3D卷积网络

Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition.

作者信息

Chen Huizhou, Li Yunan, Fang Huijuan, Xin Wentian, Lu Zixiang, Miao Qiguang

机构信息

School of Computer Science and Technology, Xidian University, Xi'an 710071, China.

Xiaomi Communications, Beijing 100085, China.

出版信息

Sensors (Basel). 2022 Mar 21;22(6):2405. doi: 10.3390/s22062405.

DOI:10.3390/s22062405
PMID:35336576
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8950910/
Abstract

Gesture recognition is an important direction in computer vision research. Information from the hands is crucial in this task. However, current methods consistently achieve attention on hand regions based on estimated keypoints, which will significantly increase both time and complexity, and may lose position information of the hand due to wrong keypoint estimations. Moreover, for dynamic gesture recognition, it is not enough to consider only the attention in the spatial dimension. This paper proposes a multi-scale attention 3D convolutional network for gesture recognition, with a fusion of multimodal data. The proposed network achieves attention mechanisms both locally and globally. The local attention leverages the hand information extracted by the hand detector to focus on the hand region, and reduces the interference of gesture-irrelevant factors. Global attention is achieved in both the human-posture context and the channel context through a dual spatiotemporal attention module. Furthermore, to make full use of the differences between different modalities of data, we designed a multimodal fusion scheme to fuse the features of RGB and depth data. The proposed method is evaluated using the Chalearn LAP Isolated Gesture Dataset and the Briareo Dataset. Experiments on these two datasets prove the effectiveness of our network and show it outperforms many state-of-the-art methods.

摘要

手势识别是计算机视觉研究中的一个重要方向。手部信息在这项任务中至关重要。然而,当前的方法始终基于估计的关键点来实现对手部区域的关注,这将显著增加时间和复杂度,并且可能由于错误的关键点估计而丢失手部的位置信息。此外,对于动态手势识别,仅考虑空间维度上的关注是不够的。本文提出了一种用于手势识别的多尺度注意力3D卷积网络,并融合了多模态数据。所提出的网络在局部和全局都实现了注意力机制。局部注意力利用手部检测器提取的手部信息来聚焦于手部区域,并减少与手势无关因素的干扰。通过双时空注意力模块在人体姿态上下文和通道上下文中实现全局注意力。此外,为了充分利用不同模态数据之间的差异,我们设计了一种多模态融合方案来融合RGB和深度数据的特征。使用Chalearn LAP孤立手势数据集和Briareo数据集对所提出的方法进行了评估。在这两个数据集上的实验证明了我们网络的有效性,并表明它优于许多现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/78e762b81ec6/sensors-22-02405-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/769f9764fb93/sensors-22-02405-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/ed7dc157d043/sensors-22-02405-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/68b16e974472/sensors-22-02405-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/f32ade47424c/sensors-22-02405-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/8bc651fa943e/sensors-22-02405-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/78e762b81ec6/sensors-22-02405-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/769f9764fb93/sensors-22-02405-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/ed7dc157d043/sensors-22-02405-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/68b16e974472/sensors-22-02405-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/f32ade47424c/sensors-22-02405-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/8bc651fa943e/sensors-22-02405-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/8950910/78e762b81ec6/sensors-22-02405-g006.jpg

相似文献

1
Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition.用于多模态手势识别的多尺度注意力3D卷积网络
Sensors (Basel). 2022 Mar 21;22(6):2405. doi: 10.3390/s22062405.
2
Attentive 3D-Ghost Module for Dynamic Hand Gesture Recognition with Positive Knowledge Transfer.带正迁移知识的动态手势识别注意力 3D-Ghost 模块。
Comput Intell Neurosci. 2021 Nov 18;2021:5044916. doi: 10.1155/2021/5044916. eCollection 2021.
3
Dynamic gesture recognition based on 2D convolutional neural network and feature fusion.基于二维卷积神经网络和特征融合的动态手势识别。
Sci Rep. 2022 Mar 14;12(1):4345. doi: 10.1038/s41598-022-08133-z.
4
Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition.深度动态神经网络用于多模态手势分割与识别。
IEEE Trans Pattern Anal Mach Intell. 2016 Aug;38(8):1583-97. doi: 10.1109/TPAMI.2016.2537340. Epub 2016 Mar 2.
5
Transformer-based hand gesture recognition from instantaneous to fused neural decomposition of high-density EMG signals.基于Transformer 的手 gestures 识别,来自高密度 EMG 信号的即时融合神经分解。
Sci Rep. 2023 Jul 7;13(1):11000. doi: 10.1038/s41598-023-36490-w.
6
Dynamic Gesture Recognition Algorithm Based on 3D Convolutional Neural Network.基于三维卷积神经网络的动态手势识别算法。
Comput Intell Neurosci. 2021 Aug 16;2021:4828102. doi: 10.1155/2021/4828102. eCollection 2021.
7
MSFF-Net: Multi-Stream Feature Fusion Network for surface electromyography gesture recognition.MSFF-Net:用于表面肌电信号手势识别的多流特征融合网络。
PLoS One. 2022 Nov 7;17(11):e0276436. doi: 10.1371/journal.pone.0276436. eCollection 2022.
8
TMMF: Temporal Multi-Modal Fusion for Single-Stage Continuous Gesture Recognition.TMMF:用于单阶段连续手势识别的时频多模态融合。
IEEE Trans Image Process. 2021;30:7689-7701. doi: 10.1109/TIP.2021.3108349. Epub 2021 Sep 10.
9
Finger Gesture Spotting from Long Sequences Based on Multi-Stream Recurrent Neural Networks.基于多流循环神经网络的长序列手指手势识别。
Sensors (Basel). 2020 Jan 18;20(2):528. doi: 10.3390/s20020528.
10
Research on gesture recognition algorithm based on MME-P3D.基于 MME-P3D 的手势识别算法研究。
Math Biosci Eng. 2024 Feb 5;21(3):3594-3617. doi: 10.3934/mbe.2024158.

引用本文的文献

1
TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology.TOSD:一种集成形状、颜色和拓扑结构的分层对象中心描述符。
Sensors (Basel). 2025 Jul 25;25(15):4614. doi: 10.3390/s25154614.
2
A Short Video Classification Framework Based on Cross-Modal Fusion.基于跨模态融合的短视频分类框架
Sensors (Basel). 2023 Oct 12;23(20):8425. doi: 10.3390/s23208425.
3
Multi-view and multi-scale behavior recognition algorithm based on attention mechanism.基于注意力机制的多视图多尺度行为识别算法

本文引用的文献

1
Redundancy and Attention in Convolutional LSTM for Gesture Recognition.用于手势识别的卷积长短期记忆网络中的冗余与注意力机制
IEEE Trans Neural Netw Learn Syst. 2019 Jun 28. doi: 10.1109/TNNLS.2019.2919764.
2
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.
Front Neurorobot. 2023 Sep 26;17:1276208. doi: 10.3389/fnbot.2023.1276208. eCollection 2023.
4
Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer.基于 3D-Jointsformer 的实时单目手部骨骼手势识别。
Sensors (Basel). 2023 Aug 10;23(16):7066. doi: 10.3390/s23167066.
5
A Sign Language Recognition System Applied to Deaf-Mute Medical Consultation.手语识别系统在聋哑人医疗咨询中的应用。
Sensors (Basel). 2022 Nov 24;22(23):9107. doi: 10.3390/s22239107.