利用头皮脑电图（EEG）对公开言语进行连续和离散解码。

Craik Alexander, Dial Heather, L Contreras-Vidal Jose

Department of Electrical and Computer Engineering, University of Houston, Houston, TX, United States of America.

NSF IUCRC BRAIN, University of Houston, Houston, TX, United States of America.

J Neural Eng. 2025 Mar 14;22(2). doi: 10.1088/1741-2552/ad8d0a.

. Neurological disorders affecting speech production adversely impact quality of life for over 7 million individuals in the US. Traditional speech interfaces like eye-tracking devices and P300 spellers are slow and unnatural for these patients. An alternative solution, speech brain-computer interfaces (BCIs), directly decodes speech characteristics, offering a more natural communication mechanism. This research explores the feasibility of decoding speech features using non-invasive EEG.. Nine neurologically intact participants were equipped with a 63-channel EEG system with additional sensors to eliminate eye artifacts. Participants read aloud sentences selected for phonetic similarity to the English language. Deep learning models, including Convolutional Neural Networks and Recurrent Neural Networks with and without attention modules, were optimized with a focus on minimizing trainable parameters and utilizing small input window sizes for real-time application. These models were employed for discrete and continuous speech decoding tasks.. Statistically significant participant-independent decoding performance was achieved for discrete classes and continuous characteristics of the produced audio signal. A frequency sub-band analysis highlighted the significance of certain frequency bands (delta, theta, gamma) for decoding performance, and a perturbation analysis was used to identify crucial channels. Assessed channel selection methods did not significantly improve performance, suggesting a distributed representation of speech information encoded in the EEG signals. Leave-One-Out training demonstrated the feasibility of utilizing common speech neural correlates, reducing data collection requirements from individual participants.. These findings contribute significantly to the development of EEG-enabled speech synthesis by demonstrating the feasibility of decoding both discrete and continuous speech features from EEG signals, even in the presence of EMG artifacts. By addressing the challenges of EMG interference and optimizing deep learning models for speech decoding, this study lays a strong foundation for EEG-based speech BCIs.

在美国，影响言语产生的神经系统疾病对700多万人的生活质量产生了不利影响。传统的语音接口，如眼动追踪设备和P300拼写器，对这些患者来说既缓慢又不自然。另一种解决方案是语音脑机接口（BCI），它直接解码语音特征，提供了一种更自然的交流机制。本研究探讨了使用非侵入性脑电图解码语音特征的可行性。九名神经功能正常的参与者配备了一个63通道的脑电图系统，并附加了传感器以消除眼部伪影。参与者大声朗读为与英语语音相似性而选择的句子。深度学习模型，包括带有和不带有注意力模块的卷积神经网络和循环神经网络，经过优化，重点是最小化可训练参数并利用小输入窗口大小以实现实时应用。这些模型被用于离散和连续语音解码任务。对于所产生音频信号的离散类别和连续特征，实现了具有统计学意义的与参与者无关的解码性能。频率子带分析突出了某些频段（δ、θ、γ）对解码性能的重要性，并使用微扰分析来识别关键通道。评估的通道选择方法并没有显著提高性能，这表明脑电图信号中编码的语音信息具有分布式表示。留一法训练证明了利用常见语音神经关联的可行性，减少了对单个参与者的数据收集要求。这些发现通过证明即使在存在肌电图伪影的情况下也能从脑电图信号中解码离散和连续语音特征的可行性，为基于脑电图的语音合成发展做出了重大贡献。通过解决肌电图干扰的挑战并优化用于语音解码的深度学习模型，本研究为基于脑电图的语音脑机接口奠定了坚实的基础。