School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China.
School of Rehabilitation Science and Engineering, University of Health and Rehabilitation Sciences, Qingdao 266072, China.
Sensors (Basel). 2023 Jun 21;23(13):5775. doi: 10.3390/s23135775.
Deaf and hearing-impaired people always face communication barriers. Non-invasive surface electromyography (sEMG) sensor-based sign language recognition (SLR) technology can help them to better integrate into social life. Since the traditional tandem convolutional neural network (CNN) structure used in most CNN-based studies inadequately captures the features of the input data, we propose a novel inception architecture with a residual module and dilated convolution (IRDC-net) to enlarge the receptive fields and enrich the feature maps, applying it to SLR tasks for the first time. This work first transformed the time domain signal into a time-frequency domain using discrete Fourier transformation. Second, an IRDC-net was constructed to recognize ten Chinese sign language signs. Third, the tandem CNN networks VGG-net and ResNet-18 were compared with our proposed parallel structure network, IRDC-net. Finally, the public dataset Ninapro DB1 was utilized to verify the generalization performance of the IRDC-net. The results showed that after transforming the time domain sEMG signal into the time-frequency domain, the classification accuracy (acc) increased from 84.29% to 91.70% when using the IRDC-net on our sign language dataset. Furthermore, for the time-frequency information of the public dataset Ninapro DB1, the classification accuracy reached 89.82%; this value is higher than that achieved in other recent studies. As such, our findings contribute to research into SLR tasks and to improving deaf and hearing-impaired people's daily lives.
聋人和听力障碍人士总是面临着沟通障碍。基于非侵入式表面肌电(sEMG)传感器的手语识别(SLR)技术可以帮助他们更好地融入社会生活。由于大多数基于 CNN 的研究中使用的传统串联卷积神经网络(CNN)结构不能充分捕获输入数据的特征,我们提出了一种新的具有残差模块和扩张卷积(IRDC-net)的 inception 架构,用于首次将其应用于 SLR 任务。这项工作首先使用离散傅里叶变换将时域信号转换为时频域。其次,构建了一个 IRDC-net 来识别十个中文手语手势。然后,将串联 CNN 网络 VGG-net 和 ResNet-18 与我们提出的并行结构网络 IRDC-net 进行比较。最后,利用公共数据集 Ninapro DB1 验证了 IRDC-net 的泛化性能。结果表明,在将时域 sEMG 信号转换为时频域后,在我们的手语数据集上使用 IRDC-net 时,分类准确率(acc)从 84.29%提高到了 91.70%。此外,对于公共数据集 Ninapro DB1 的时频信息,分类准确率达到了 89.82%;这一数值高于其他最近的研究结果。因此,我们的研究结果有助于 SLR 任务的研究,并改善聋人和听力障碍人士的日常生活。