Assiri Mohammed, Selim Mahmoud M
Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, P.O. BOX 16273, 3963, Al-Kharj, Saudi Arabia.
King Salman Center for Disability Research, 11614, Riyadh, Saudi Arabia.
Sci Rep. 2025 Jul 1;15(1):21441. doi: 10.1038/s41598-025-06680-9.
Sign language (SL) is the linguistics of speech and hearing-impaired individuals. The hand gesture is the primary model employed in SL by speech and hearing-challenged people to talk with themselves and ordinary persons. At present, hand gesture detection plays a vital part, and it is commonly employed in numerous applications worldwide. Hand gesture detection systems can aid in transmission between machines and humans by aiding these sets of people. Machine learning (ML) is a subdivision of artificial intelligence (AI), which concentrates on the growth of a method. The main challenge in hand gesture detection is that machines do not directly understand human language. A standard medium is required to facilitate communication between humans and machines. Hand gesture recognition (GR) serves as this medium, enabling commands for computer interaction that specifically benefit hearing-impaired and elderly individuals. This study proposes a Gesture Recognition for Hearing Impaired People Using an Ensemble of Deep Learning Models with Improving Beluga Whale Optimization (GRHIP-EDLIBWO) model. The main intention of the GRHIP-EDLIBWO model framework for GR is to assist as a valuable tool for developing accessible communication systems for hearing-impaired individuals. To accomplish that, the GRHIP-EDLIBWO method initially performs image preprocessing using a Sobel filter (SF) to enhance edge detection and extract critical gesture features. For the feature extraction process, the squeeze-and-excitation capsule network (SE-CapsNet) effectively captures spatial hierarchies and complex relationships within gesture patterns. In addition, an ensemble of classification processes, such as bidirectional gated recurrent unit (BiGRU), Variational Autoencoder (VAE), and bidirectional long short-term memory (BiLSTM) technique, is employed. Finally, the improved beluga whale optimization (IBWO) method is implemented for the hyperparameter tuning of the three ensemble models. To achieve a robust classification result with the GRHIP-EDLIBWO approach, extensive simulations are conducted on an Indian SL (ISL) dataset. The performance validation of the GRHIP-EDLIBWO approach portrayed a superior accuracy value of 98.72% over existing models.
手语是针对有言语和听力障碍的个体的语言体系。手势是有言语和听力障碍的人在使用手语与自己以及普通人交流时所采用的主要方式。目前,手势检测起着至关重要的作用,并且在全球众多应用中普遍使用。手势检测系统可以通过帮助这些人群来辅助机器与人类之间的信息传递。机器学习(ML)是人工智能(AI)的一个分支,专注于一种方法的发展。手势检测中的主要挑战在于机器无法直接理解人类语言。需要一种标准媒介来促进人类与机器之间的交流。手势识别(GR)充当了这种媒介,实现了对计算机交互的指令,这对听力障碍者和老年人尤其有益。本研究提出了一种基于改进的白鲸优化算法的深度学习模型集成的听力障碍者手势识别(GRHIP - EDLIBWO)模型。GRHIP - EDLIBWO模型框架用于手势识别的主要目的是作为一种有价值的工具,用于开发面向听力障碍者的无障碍通信系统。为了实现这一目标,GRHIP - EDLIBWO方法首先使用索贝尔滤波器(SF)进行图像预处理,以增强边缘检测并提取关键手势特征。在特征提取过程中,挤压激励胶囊网络(SE - CapsNet)有效地捕捉手势模式中的空间层次结构和复杂关系。此外,还采用了双向门控循环单元(BiGRU)、变分自编码器(VAE)和双向长短期记忆(BiLSTM)技术等分类过程的集成。最后,采用改进的白鲸优化(IBWO)方法对这三个集成模型进行超参数调整。为了通过GRHIP - EDLIBWO方法获得稳健的分类结果,在印度手语(ISL)数据集上进行了广泛的模拟。GRHIP - EDLIBWO方法的性能验证显示,其准确率比现有模型高出98.72%。