Chen Xi, Zhang Xu, Chen Xiang, Chen Xun
IEEE Trans Neural Syst Rehabil Eng. 2023;31:2069-2078. doi: 10.1109/TNSRE.2023.3266299. Epub 2023 Apr 26.
Finer-grained decoding at a phoneme or syllable level is a key technology for continuous recognition of silent speech based on surface electromyogram (sEMG). This paper aims at developing a novel syllable-level decoding method for continuous silent speech recognition (SSR) using spatio-temporal end-to-end neural network. In the proposed method, the high-density sEMG (HD-sEMG) was first converted into a series of feature images, and then a spatio-temporal end-to-end neural network was applied to extract discriminative feature representations and to achieve syllable-level decoding. The effectiveness of the proposed method was verified with HD-sEMG data recorded by four pieces of 64-channel electrode arrays placed over facial and laryngeal muscles of fifteen subjects subvocalizing 33 Chinese phrases consisting of 82 syllables. The proposed method outperformed the benchmark methods by achieving the highest phrase classification accuracy (97.17 ± 1.53%, ), and lower character error rate (3.11 ± 1.46%, ). This study provides a promising way of decoding sEMG towards SSR, which has great potential applications in instant communication and remote control.
在音素或音节层面进行更细粒度的解码是基于表面肌电图(sEMG)的无声语音连续识别的一项关键技术。本文旨在开发一种使用时空端到端神经网络的新型音节级解码方法,用于连续无声语音识别(SSR)。在所提出的方法中,首先将高密度sEMG(HD-sEMG)转换为一系列特征图像,然后应用时空端到端神经网络来提取判别性特征表示并实现音节级解码。通过由放置在15名受试者面部和喉部肌肉上的四套64通道电极阵列记录的HD-sEMG数据,验证了所提出方法的有效性,这些受试者默读了由82个音节组成的33个中文短语。所提出的方法通过实现最高的短语分类准确率(97.17 ± 1.53%)和较低的字符错误率(3.11 ± 1.46%),优于基准方法。本研究为朝着SSR方向解码sEMG提供了一种有前景的方法,其在即时通信和远程控制方面具有巨大的潜在应用。