• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 3D-Jointsformer 的实时单目手部骨骼手势识别。

Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer.

机构信息

Grupo de Tratamiento de Imágenes (GTI), Information Processing and Telecommunications Center, ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain.

出版信息

Sensors (Basel). 2023 Aug 10;23(16):7066. doi: 10.3390/s23167066.

DOI:10.3390/s23167066
PMID:37631602
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10459010/
Abstract

Automatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. The primary challenge lies in achieving real-time recognition while managing temporal dependencies that can impact performance. Existing methods employ 3D convolutional or Transformer-based architectures with hand skeleton estimation, but both have limitations. To address these challenges, a hybrid approach that combines 3D Convolutional Neural Networks (3D-CNNs) and Transformers is proposed. The method involves using a 3D-CNN to compute high-level semantic skeleton embeddings, capturing local spatial and temporal characteristics of hand gestures. A Transformer network with a self-attention mechanism is then employed to efficiently capture long-range temporal dependencies in the skeleton sequence. Evaluation of the Briareo and Multimodal Hand Gesture datasets resulted in accuracy scores of 95.49% and 97.25%, respectively. Notably, this approach achieves real-time performance using a standard CPU, distinguishing it from methods that require specialized GPUs. The hybrid approach's real-time efficiency and high accuracy demonstrate its superiority over existing state-of-the-art methods. In summary, the hybrid 3D-CNN and Transformer approach effectively addresses real-time recognition challenges and efficient handling of temporal dependencies, outperforming existing methods in both accuracy and speed.

摘要

自动视频序列中的手势识别具有广泛的应用,从家庭自动化到手语翻译和临床操作。主要的挑战在于在管理可能影响性能的时间依赖性的同时实现实时识别。现有的方法使用带有手部骨骼估计的 3D 卷积或基于 Transformer 的架构,但两者都有局限性。为了解决这些挑战,提出了一种结合 3D 卷积神经网络 (3D-CNN) 和 Transformer 的混合方法。该方法使用 3D-CNN 计算高级语义骨骼嵌入,捕获手部手势的局部空间和时间特征。然后使用具有自注意力机制的 Transformer 网络来有效地捕捉骨骼序列中的长程时间依赖性。在 Briareo 和 Multimodal Hand Gesture 数据集上的评估分别得到了 95.49%和 97.25%的准确率。值得注意的是,该方法使用标准 CPU 实现了实时性能,与需要专用 GPU 的方法区分开来。混合方法的实时效率和高精度表明其优于现有的最先进方法。总之,混合 3D-CNN 和 Transformer 方法有效地解决了实时识别挑战和时间依赖性的有效处理问题,在准确性和速度方面都优于现有的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/68c1c8dce32e/sensors-23-07066-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/52d3a13b5f2f/sensors-23-07066-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/bce3225ef1e1/sensors-23-07066-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/3ea3791ec672/sensors-23-07066-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/0e7c67fb1a05/sensors-23-07066-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/a6ffe45c3cf0/sensors-23-07066-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/68c1c8dce32e/sensors-23-07066-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/52d3a13b5f2f/sensors-23-07066-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/bce3225ef1e1/sensors-23-07066-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/3ea3791ec672/sensors-23-07066-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/0e7c67fb1a05/sensors-23-07066-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/a6ffe45c3cf0/sensors-23-07066-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc53/10459010/68c1c8dce32e/sensors-23-07066-g006.jpg

相似文献

1
Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer.基于 3D-Jointsformer 的实时单目手部骨骼手势识别。
Sensors (Basel). 2023 Aug 10;23(16):7066. doi: 10.3390/s23167066.
2
Multi-Category Gesture Recognition Modeling Based on sEMG and IMU Signals.基于 sEMG 和 IMU 信号的多类别手势识别建模。
Sensors (Basel). 2022 Aug 5;22(15):5855. doi: 10.3390/s22155855.
3
HGR-ViT: Hand Gesture Recognition with Vision Transformer.HGR-ViT:基于视觉Transformer 的手势识别
Sensors (Basel). 2023 Jun 14;23(12):5555. doi: 10.3390/s23125555.
4
Transformer-based hand gesture recognition from instantaneous to fused neural decomposition of high-density EMG signals.基于Transformer 的手 gestures 识别,来自高密度 EMG 信号的即时融合神经分解。
Sci Rep. 2023 Jul 7;13(1):11000. doi: 10.1038/s41598-023-36490-w.
5
Hand gesture recognition using sEMG signals with a multi-stream time-varying feature enhancement approach.基于多流时变特征增强方法的 sEMG 信号手势识别。
Sci Rep. 2024 Sep 27;14(1):22061. doi: 10.1038/s41598-024-72996-7.
6
Data glove-based gesture recognition using CNN-BiLSTM model with attention mechanism.基于数据手套的卷积神经网络-双向长短时记忆模型与注意力机制的手势识别。
PLoS One. 2023 Nov 17;18(11):e0294174. doi: 10.1371/journal.pone.0294174. eCollection 2023.
7
Transferable non-invasive modal fusion-transformer (NIMFT) for end-to-end hand gesture recognition.可迁移的无创模态融合-Transformer(NIMFT)用于端到端手势识别。
J Neural Eng. 2024 Apr 9;21(2). doi: 10.1088/1741-2552/ad39a5.
8
Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network.基于微调卷积神经网络的实时手势识别。
Sensors (Basel). 2022 Jan 18;22(3):706. doi: 10.3390/s22030706.
9
Smart Home Automation-Based Hand Gesture Recognition Using Feature Fusion and Recurrent Neural Network.基于智能家居自动化的特征融合和循环神经网络的手势识别。
Sensors (Basel). 2023 Aug 30;23(17):7523. doi: 10.3390/s23177523.
10
Finger Gesture Spotting from Long Sequences Based on Multi-Stream Recurrent Neural Networks.基于多流循环神经网络的长序列手指手势识别。
Sensors (Basel). 2020 Jan 18;20(2):528. doi: 10.3390/s20020528.

引用本文的文献

1
Enhanced 2D Hand Pose Estimation for Gloved Medical Applications: A Preliminary Model.增强型手套式医学应用二维手姿估计:初步模型。
Sensors (Basel). 2024 Sep 17;24(18):6005. doi: 10.3390/s24186005.
2
Object detection in optical imaging of the Internet of Things based on deep learning.基于深度学习的物联网光学成像中的目标检测
PeerJ Comput Sci. 2023 Dec 11;9:e1718. doi: 10.7717/peerj-cs.1718. eCollection 2023.

本文引用的文献

1
Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition.用于多模态手势识别的多尺度注意力3D卷积网络
Sensors (Basel). 2022 Mar 21;22(6):2405. doi: 10.3390/s22062405.
2
A Recurrent Neural Network for Hand Gesture Recognition based on Accelerometer Data.一种基于加速度计数据的用于手势识别的递归神经网络。
Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:5088-5091. doi: 10.1109/EMBC.2019.8856844.
3
A real-time gesture recognition system using near-infrared imagery.基于近红外图像的实时手势识别系统。
PLoS One. 2019 Oct 3;14(10):e0223320. doi: 10.1371/journal.pone.0223320. eCollection 2019.
4
IMU Sensor-Based Hand Gesture Recognition for Human-Machine Interfaces.基于惯性测量单元传感器的人机界面手势识别。
Sensors (Basel). 2019 Sep 4;19(18):3827. doi: 10.3390/s19183827.
5
MFA-Net: Motion Feature Augmented Network for Dynamic Hand Gesture Recognition from Skeletal Data.MFA-Net:基于运动特征增强的骨骼数据动态手势识别网络。
Sensors (Basel). 2019 Jan 10;19(2):239. doi: 10.3390/s19020239.
6
Analysis of the accuracy and robustness of the leap motion controller.跃动控制器的精度和稳健性分析。
Sensors (Basel). 2013 May 14;13(5):6380-93. doi: 10.3390/s130506380.