Almjally Abrar, Algamdi Shabbab Ali, Aljohani Nasser, Nour Mohamed K
Department of Information Technology, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 13318, Saudi Arabia.
King Salman Center for Disability Research, Riyadh, 11614, Saudi Arabia.
Sci Rep. 2025 Sep 1;15(1):32255. doi: 10.1038/s41598-025-15109-2.
Speech is the primary form of communication; still, there are people whose hearing or speaking skills are disabled. Communication offers an essential hurdle for people with such an impairment. Sign Languages (SLs) are the natural languages of the Deaf and their primary means of communication. As visual languages, they use numerous corresponding channels to transfer information. This includes manual features, such as hand pose, shape, and movement, as well as non-manual features, including mouth movements, head, shoulder, torso, and facial expressions. SL recognition (SLR) consists of the complete procedure of following and recognizing the signs achieved and transforming them into semantically essential words. SLR is a visual language which communicates meaning through body and hand gestures. Currently, much research work in SLR, depending on the deep learning (DL) model, is implemented on SLs. This paper proposes an Attention-Driven Hybrid Deep Learning Model with Feature Fusion for Accurate Sign Language Recognition (AHDLMFF-ASLR) model. The primary goal of the AHDLMFF-ASLR model is to enhance SLR for deaf and mute individuals by utilizing advanced techniques for accurate, real-time gesture recognition. In its initial stage of image pre-processing, contrast-limited adaptive histogram equalization (CLAHE) is used to enhance image details, and Canny edge detection (CED) is employed to emphasize the edges of objects. Furthermore, the feature extraction process integrates the Swin Transformer (ST), ConvNeXt-Large, and ResNet50 models. Finally, the AHDLMFF-ASLR model utilizes a hybrid of a convolutional neural network and bidirectional long short-term memory with attention (C-BiL-A) technique for the classification process. The efficiency of the AHDLMFF-ASLR technique is examined under the SL dataset. The comparison study of the AHDLMFF-ASLR technique revealed a superior accuracy value of 98.10% compared to existing models.
言语是交流的主要形式;然而,仍有一些人的听力或说话能力存在障碍。交流对有此类障碍的人来说是一个基本障碍。手语是聋人的自然语言及其主要交流方式。作为视觉语言,它们使用众多相应渠道来传递信息。这包括手部特征,如手势、形状和动作,以及非手部特征,包括口部动作、头部、肩膀、躯干和面部表情。手语识别(SLR)包括跟踪和识别所做手势并将其转换为语义上重要的单词的完整过程。SLR是一种通过身体和手势传达意义的视觉语言。目前,基于深度学习(DL)模型的手语识别方面的许多研究工作都是在手语上开展的。本文提出了一种用于精确手语识别的注意力驱动特征融合混合深度学习模型(AHDLMFF - ASLR)。AHDLMFF - ASLR模型的主要目标是通过利用先进技术进行精确的实时手势识别来增强聋人和哑巴个体的手语识别能力。在其图像预处理的初始阶段,使用对比度受限自适应直方图均衡化(CLAHE)来增强图像细节,并采用Canny边缘检测(CED)来突出物体边缘。此外,特征提取过程整合了Swin Transformer(ST)、ConvNeXt - Large和ResNet50模型。最后,AHDLMFF - ASLR模型在分类过程中利用卷积神经网络和带注意力的双向长短期记忆(C - BiL - A)技术的混合方法。在SL数据集下检验了AHDLMFF - ASLR技术的效率。与现有模型相比,AHDLMFF - ASLR技术的比较研究显示出98.10%的卓越准确率。