Suppr超能文献

HGR-ViT:基于视觉Transformer 的手势识别

HGR-ViT: Hand Gesture Recognition with Vision Transformer.

机构信息

Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia.

Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia.

出版信息

Sensors (Basel). 2023 Jun 14;23(12):5555. doi: 10.3390/s23125555.

Abstract

Hand gesture recognition (HGR) is a crucial area of research that enhances communication by overcoming language barriers and facilitating human-computer interaction. Although previous works in HGR have employed deep neural networks, they fail to encode the orientation and position of the hand in the image. To address this issue, this paper proposes HGR-ViT, a Vision Transformer (ViT) model with an attention mechanism for hand gesture recognition. Given a hand gesture image, it is first split into fixed size patches. Positional embedding is added to these embeddings to form learnable vectors that capture the positional information of the hand patches. The resulting sequence of vectors are then served as the input to a standard Transformer encoder to obtain the hand gesture representation. A multilayer perceptron head is added to the output of the encoder to classify the hand gesture to the correct class. The proposed HGR-ViT obtains an accuracy of 99.98%, 99.36% and 99.85% for the American Sign Language (ASL) dataset, ASL with Digits dataset, and National University of Singapore (NUS) hand gesture dataset, respectively.

摘要

手勢识别(HGR)是一个重要的研究领域,通过克服语言障碍和促进人机交互来增强通信。虽然 HGR 中的先前工作已经使用了深度神经网络,但它们无法对图像中的手的方向和位置进行编码。为了解决这个问题,本文提出了 HGR-ViT,这是一种具有注意力机制的视觉转换器(ViT)模型,用于手勢识别。给定一个手勢图像,首先将其分割成固定大小的补丁。将位置嵌入添加到这些嵌入中,以形成可学习的向量,这些向量捕获手补丁的位置信息。然后,将得到的向量序列作为标准转换器编码器的输入,以获得手勢表示。在编码器的输出上添加一个多层感知机头,将手勢分类到正确的类别。所提出的 HGR-ViT 在手勢识别数据集、ASL 与数字数据集和新加坡国立大学(NUS)手勢数据集上的准确率分别为 99.98%、99.36%和 99.85%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9750/10303839/0cbad0cae001/sensors-23-05555-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验