Zhang Junming, Bu Xiaolong, Wang Yushuai, Dong Hao, Zhang Yu, Wu Haitao
School of Computer and Artificial Intelligence, Huanghuai University, Zhumadian, 463000, Henan Province, China.
Key Laboratory of Intelligent Lighting, Henan Province, Zhumadian, 463000, China.
Sci Rep. 2024 May 18;14(1):11360. doi: 10.1038/s41598-024-62008-z.
Sign language is an important way to provide expression information to people with hearing and speaking disabilities. Therefore, sign language recognition has always been a very important research topic. However, many sign language recognition systems currently require complex deep models and rely on expensive sensors, which limits the application scenarios of sign language recognition. To address this issue, based on computer vision, this study proposed a lightweight, dual-path background erasing deep convolutional neural network (DPCNN) model for sign language recognition. The DPCNN consists of two paths. One path is used to learn the overall features, while the other path learns the background features. The background features are gradually subtracted from the overall features to obtain an effective representation of hand features. Then, these features are flatten into a one-dimensional layer, and pass through a fully connected layer with an output unit of 128. Finally, use a fully connected layer with an output unit of 24 as the output layer. Based on the ASL Finger Spelling dataset, the total accuracy and Macro-F1 scores of the proposed method is 99.52% and 0.997, respectively. More importantly, the proposed method can be applied to small terminals, thereby improving the application scenarios of sign language recognition. Through experimental comparison, the dual path background erasure network model proposed in this paper has better generalization ability.
手语是向听力和语言有障碍的人提供表达信息的重要方式。因此,手语识别一直是一个非常重要的研究课题。然而,目前许多手语识别系统需要复杂的深度模型,并且依赖昂贵的传感器,这限制了手语识别的应用场景。为了解决这个问题,基于计算机视觉,本研究提出了一种用于手语识别的轻量级双路径背景擦除深度卷积神经网络(DPCNN)模型。DPCNN由两条路径组成。一条路径用于学习整体特征,而另一条路径学习背景特征。背景特征从整体特征中逐渐减去,以获得手部特征的有效表示。然后,将这些特征展平为一维层,并通过一个输出单元为128的全连接层。最后,使用一个输出单元为24的全连接层作为输出层。基于美国手语手指拼写数据集,该方法的总准确率和宏F1分数分别为99.52%和0.997。更重要的是,该方法可以应用于小型终端,从而改善手语识别的应用场景。通过实验比较,本文提出的双路径背景擦除网络模型具有更好的泛化能力。