轻量级深度学习技术与高级处理的实时手势识别。

Light-Weight Deep Learning Techniques with Advanced Processing for Real-Time Hand Gesture Recognition.

机构信息

Department of Computer Engineering, Gachon University, Seongnam 1342, Republic of Korea.

Informatics Department, Electronics Research Institute (ERI), Cairo 11843, Egypt.

出版信息

Sensors (Basel). 2022 Dec 20;23(1):2. doi: 10.3390/s23010002.

DOI:10.3390/s23010002

PMID:36616601

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9823561/

Abstract

In the discipline of hand gesture and dynamic sign language recognition, deep learning approaches with high computational complexity and a wide range of parameters have been an extremely remarkable success. However, the implementation of sign language recognition applications for mobile phones with restricted storage and computing capacities is usually greatly constrained by those limited resources. In light of this situation, we suggest lightweight deep neural networks with advanced processing for real-time dynamic sign language recognition (DSLR). This paper presents a DSLR application to minimize the gap between hearing-impaired communities and regular society. The DSLR application was developed using two robust deep learning models, the GRU and the 1D CNN, combined with the MediaPipe framework. In this paper, the authors implement advanced processes to solve most of the DSLR problems, especially in real-time detection, e.g., differences in depth and location. The solution method consists of three main parts. First, the input dataset is preprocessed with our algorithm to standardize the number of frames. Then, the MediaPipe framework extracts hands and poses landmarks (features) to detect and locate them. Finally, the features of the models are passed after processing the unification of the depth and location of the body to recognize the DSL accurately. To accomplish this, the authors built a new American video-based sign dataset and named it DSL-46. DSL-46 contains 46 daily used signs that were presented with all the needed details and properties for recording the new dataset. The results of the experiments show that the presented solution method can recognize dynamic signs extremely fast and accurately, even in real-time detection. The DSLR reaches an accuracy of 98.8%, 99.84%, and 88.40% on the DSL-46, LSA64, and LIBRAS-BSL datasets, respectively.

摘要

在手势和动态手语识别领域，具有高计算复杂度和广泛参数的深度学习方法取得了非常显著的成功。然而，对于存储和计算能力有限的手机，手语识别应用的实现通常受到这些有限资源的极大限制。针对这种情况，我们提出了一种用于实时动态手语识别（DSLR）的轻量级深度学习网络，该网络具有先进的处理能力。本文提出了一种 DSLR 应用，旨在缩小听障人士和普通社会之间的差距。该 DSLR 应用是使用两个强大的深度学习模型（GRU 和 1D CNN）与 MediaPipe 框架结合开发的。在本文中，作者实现了先进的处理过程，以解决大多数 DSLR 问题，特别是在实时检测方面，例如深度和位置的差异。该解决方案方法包括三个主要部分。首先，使用我们的算法预处理输入数据集，以标准化帧数。然后，MediaPipe 框架提取手部和姿势地标（特征）来检测和定位它们。最后，将模型的特征传递后，对深度和位置的统一进行处理，以准确识别 DSL。为此，作者构建了一个新的基于美国视频的手语数据集，并将其命名为 DSL-46。DSL-46 包含 46 个日常使用的手语，这些手语具有记录新数据集所需的所有详细信息和属性。实验结果表明，所提出的解决方案方法可以非常快速和准确地识别动态手语，即使是在实时检测中。在 DSL-46、LSA64 和 LIBRAS-BSL 数据集上，DSLR 的准确率分别达到 98.8%、99.84%和 88.40%。