Suppr超能文献

基于视觉Transformer的高密度表面肌电信号手势识别

ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:5115-5119. doi: 10.1109/EMBC48229.2022.9871489.

Abstract

Recently, there has been a surge of significant interest on application of Deep Learning (DL) models to autonomously perform hand gesture recognition using surface Electromyogram (sEMG) signals. Many of the existing DL models are, however, designed to be applied on sparse sEMG signals. Furthermore, due to the complex structure of these models, typically, we are faced with memory constraint issues, require large training times and a large number of training samples, and; there is the need to resort to data augmentation and/or transfer learning. In this paper, for the first time (to the best of our knowledge), we investigate and design a Vision Transformer (ViT) based architecture to perform hand gesture recognition from High Density (HD-sEMG) signals. Intuitively speaking, we capitalize on the recent breakthrough role of the transformer architecture in tackling different com-plex problems together with its potential for employing more input parallelization via its attention mechanism. The proposed Vision Transformer-based Hand Gesture Recognition (ViT-HGR) framework can overcome the aforementioned training time problems and can accurately classify a large number of hand gestures from scratch without any need for data augmentation and/or transfer learning. The efficiency of the proposed ViT-HGR framework is evaluated using a recently-released HD-sEMG dataset consisting of 65 isometric hand gestures. Our experiments with 64-sample (31.25 ms) window size yield average test accuracy of 84.62 ± 3.07%, where only 78,210 learnable parameters are utilized in the model. The compact structure of the proposed ViT-based ViT-HGR framework (i.e., having significantly reduced number of trainable parameters) shows great potentials for its practical application for prosthetic control.

摘要

最近,人们对深度学习 (DL) 模型在自主执行基于表面肌电 (sEMG) 信号的手势识别方面的应用产生了浓厚的兴趣。然而,许多现有的 DL 模型都是为稀疏 sEMG 信号设计的。此外,由于这些模型的复杂结构,我们通常面临内存限制问题,需要大量的训练时间和大量的训练样本,并且需要诉诸数据扩充和/或迁移学习。在本文中,我们首次(据我们所知)研究和设计了一种基于视觉转换器 (ViT) 的架构,用于从高密度 (HD-sEMG) 信号中进行手势识别。直观地说,我们利用转换器架构在处理不同复杂问题方面的最新突破,以及其通过注意力机制实现更多输入并行化的潜力。所提出的基于视觉转换器的手势识别 (ViT-HGR) 框架可以克服上述训练时间问题,并可以在无需数据扩充和/或迁移学习的情况下从头开始准确地对大量手势进行分类。所提出的 ViT-HGR 框架的效率使用最近发布的由 65 个等距手势组成的 HD-sEMG 数据集进行评估。我们使用 64 个样本(31.25 毫秒)窗口大小的实验得到的平均测试准确率为 84.62±3.07%,其中模型仅使用了 78210 个可学习参数。所提出的基于 ViT 的 ViT-HGR 框架的紧凑结构(即,具有大大减少的可训练参数数量)显示出其在假肢控制中的实际应用的巨大潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验