Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi'an, People's Republic of China.
J Neural Eng. 2021 Aug 11;18(4). doi: 10.1088/1741-2552/ac13c0.
Directly decoding imagined speech from electroencephalogram (EEG) signals has attracted much interest in brain-computer interface applications, because it provides a natural and intuitive communication method for locked-in patients. Several methods have been applied to imagined speech decoding, but how to construct spatial-temporal dependencies and capture long-range contextual cues in EEG signals to better decode imagined speech should be considered.In this study, we propose a novel model called hybrid-scale spatial-temporal dilated convolution network (HS-STDCN) for EEG-based imagined speech recognition. HS-STDCN integrates feature learning from temporal and spatial information into a unified end-to-end model. To characterize the temporal dependencies of the EEG sequences, we adopted a hybrid-scale temporal convolution layer to capture temporal information at multiple levels. A depthwise spatial convolution layer was then designed to construct intrinsic spatial relationships of EEG electrodes, which can produce a spatial-temporal representation of the input EEG data. Based on the spatial-temporal representation, dilated convolution layers were further employed to learn long-range discriminative features for the final classification.To evaluate the proposed method, we compared the HS-STDCN with other existing methods on our collected dataset. The HS-STDCN achieved an averaged classification accuracy of 54.31% for decoding eight imagined words, which is significantly better than other methods at a significance level of 0.05.The proposed HS-STDCN model provided an effective approach to make use of both the temporal and spatial dependencies of the input EEG signals for imagined speech recognition. We also visualized the word semantic differences to analyze the impact of word semantics on imagined speech recognition, investigated the important regions in the decoding process, and explored the use of fewer electrodes to achieve comparable performance.
直接从脑电图 (EEG) 信号中解码想象中的语音在脑机接口应用中引起了极大的兴趣,因为它为闭锁患者提供了一种自然和直观的交流方式。已经有几种方法被应用于想象中的语音解码,但如何构建 EEG 信号中的时空依赖关系并捕捉远程上下文线索,以更好地解码想象中的语音,这一点值得考虑。在这项研究中,我们提出了一种名为混合尺度时空扩张卷积网络 (HS-STDCN) 的新型模型,用于基于 EEG 的想象中的语音识别。HS-STDCN 将来自时间和空间信息的特征学习集成到一个统一的端到端模型中。为了描述 EEG 序列的时间依赖关系,我们采用了混合尺度时间卷积层来捕捉多个层次的时间信息。然后设计了一个深度卷积层来构建 EEG 电极的内在空间关系,这可以生成输入 EEG 数据的时空表示。基于时空表示,扩张卷积层进一步用于学习用于最终分类的远程判别特征。为了评估所提出的方法,我们在我们收集的数据集上比较了 HS-STDCN 与其他现有方法。HS-STDCN 在解码八个想象中的单词时的平均分类准确率为 54.31%,这明显优于其他方法,在 0.05 的显著性水平上具有统计学意义。所提出的 HS-STDCN 模型为利用输入 EEG 信号的时间和空间依赖关系进行想象中的语音识别提供了一种有效方法。我们还可视化了单词语义差异,以分析单词语义对想象中的语音识别的影响,研究了解码过程中的重要区域,并探索了使用更少的电极来实现可比性能的方法。