Suppr超能文献

基于混合深度学习模型的实时阿拉伯手语识别

Real-Time Arabic Sign Language Recognition Using a Hybrid Deep Learning Model.

机构信息

Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi Arabia.

出版信息

Sensors (Basel). 2024 Jun 6;24(11):3683. doi: 10.3390/s24113683.

Abstract

Sign language is an essential means of communication for individuals with hearing disabilities. However, there is a significant shortage of sign language interpreters in some languages, especially in Saudi Arabia. This shortage results in a large proportion of the hearing-impaired population being deprived of services, especially in public places. This paper aims to address this gap in accessibility by leveraging technology to develop systems capable of recognizing Arabic Sign Language (ArSL) using deep learning techniques. In this paper, we propose a hybrid model to capture the spatio-temporal aspects of sign language (i.e., letters and words). The hybrid model consists of a Convolutional Neural Network (CNN) classifier to extract spatial features from sign language data and a Long Short-Term Memory (LSTM) classifier to extract spatial and temporal characteristics to handle sequential data (i.e., hand movements). To demonstrate the feasibility of our proposed hybrid model, we created a dataset of 20 different words, resulting in 4000 images for ArSL: 10 static gesture words and 500 videos for 10 dynamic gesture words. Our proposed hybrid model demonstrates promising performance, with the CNN and LSTM classifiers achieving accuracy rates of 94.40% and 82.70%, respectively. These results indicate that our approach can significantly enhance communication accessibility for the hearing-impaired community in Saudi Arabia. Thus, this paper represents a major step toward promoting inclusivity and improving the quality of life for the hearing impaired.

摘要

手语是听障人士的重要交流方式。然而,某些语言,特别是在沙特阿拉伯,手语翻译人员严重短缺。这导致很大一部分听障人士无法获得服务,尤其是在公共场所。本文旨在通过利用技术开发能够使用深度学习技术识别阿拉伯手语(ArSL)的系统来解决这种无障碍访问的差距。在本文中,我们提出了一种混合模型,用于捕捉手语(即字母和单词)的时空方面。混合模型由卷积神经网络(CNN)分类器组成,用于从手语数据中提取空间特征,以及长短期记忆(LSTM)分类器,用于提取空间和时间特征来处理顺序数据(即手部运动)。为了证明我们提出的混合模型的可行性,我们创建了一个包含 20 个不同单词的数据集,这些单词包括 10 个静态手势词和 500 个动态手势词,共产生了 4000 个 ArSL 图像。我们提出的混合模型表现出了有希望的性能,CNN 和 LSTM 分类器的准确率分别达到了 94.40%和 82.70%。这些结果表明,我们的方法可以极大地增强沙特阿拉伯听障人士的交流无障碍性。因此,本文代表了促进包容性和提高听障人士生活质量的重要一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016a/11175347/e66e69c30f29/sensors-24-03683-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验