基于迁移学习技术的阿拉伯字母手语识别。

Sign Language Recognition for Arabic Alphabets Using Transfer Learning Technique.

机构信息

College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 21574, Saudi Arabia.

出版信息

Comput Intell Neurosci. 2022 Apr 22;2022:4567989. doi: 10.1155/2022/4567989. eCollection 2022.

DOI:10.1155/2022/4567989

PMID:35498192

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9054420/

Abstract

Sign language is essential for deaf and mute people to communicate with normal people and themselves. As ordinary people tend to ignore the importance of sign language, which is the mere source of communication for the deaf and the mute communities. These people are facing significant downfalls in their lives because of these disabilities or impairments leading to unemployment, severe depression, and several other symptoms. One of the services they are using for communication is the sign language interpreters. But hiring these interpreters is very costly, and therefore, a cheap solution is required for resolving this issue. Therefore, a system has been developed that will use the visual hand dataset based on an Arabic Sign Language and interpret this visual data in textual information. The dataset used consists of 54049 images of Arabic sign language alphabets consisting of 1500\ images per class, and each class represents a different meaning by its hand gesture or sign. Various preprocessing and data augmentation techniques have been applied to the images. The experiments have been performed using various pretrained models on the given dataset. Most of them performed pretty normally and in the final stage, the EfficientNetB4 model has been considered the best fit for the case. Considering the complexity of the dataset, models other than EfficientNetB4 do not perform well due to their lightweight architecture. EfficientNetB4 is a heavy-weight architecture that possesses more complexities comparatively. The best model is exposed with a training accuracy of 98 percent and a testing accuracy of 95 percent.

摘要

手语对于聋哑人来说与正常人以及他们自己交流是必不可少的。由于普通人往往忽略了手语的重要性，而手语只是聋哑人群体交流的源泉。这些人由于这些残疾或障碍，生活中面临着巨大的困境，导致失业、严重抑郁和其他一些症状。他们使用的一种交流服务是手语翻译。但是聘请这些翻译的费用非常高，因此，需要一个廉价的解决方案来解决这个问题。因此，已经开发了一种系统，该系统将使用基于阿拉伯手语的视觉手部数据集，并将这些视觉数据解释为文本信息。所使用的数据集由 54049 张阿拉伯手语字母图像组成，每个类别的图像数量为 1500 张，每个类别通过手部动作或手势代表不同的含义。已经对图像应用了各种预处理和数据增强技术。已经在给定的数据集上使用各种预训练模型进行了实验。它们中的大多数表现得相当正常，在最后阶段，考虑到数据集的复杂性，EfficientNetB4 模型被认为是最合适的。EfficientNetB4 模型是一种重型架构，与其他模型相比，它具有更多的复杂性。该最佳模型的训练准确率为 98%，测试准确率为 95%。