基于电极轴相关多输入卷积神经网络的立体脑电图语音合成。

Speech Synthesis from Stereotactic EEG using an Electrode Shaft Dependent Multi-Input Convolutional Neural Network Approach.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:6045-6048. doi: 10.1109/EMBC46164.2021.9629711.

DOI:10.1109/EMBC46164.2021.9629711

Abstract

Neurological disorders can lead to significant impairments in speech communication and, in severe cases, cause the complete loss of the ability to speak. Brain-Computer Interfaces have shown promise as an alternative communication modality by directly transforming neural activity of speech processes into a textual or audible representations. Previous studies investigating such speech neuroprostheses relied on electrocorticography (ECoG) or microelectrode arrays that acquire neural signals from superficial areas on the cortex. While both measurement methods have demonstrated successful speech decoding, they do not capture activity from deeper brain structures and this activity has therefore not been harnessed for speech-related BCIs. In this study, we bridge this gap by adapting a previously presented decoding pipeline for speech synthesis based on ECoG signals to implanted depth electrodes (sEEG). For this purpose, we propose a multi-input convolutional neural network that extracts speech-related activity separately for each electrode shaft and estimates spectral coefficients to reconstruct an audible waveform. We evaluate our approach on open-loop data from 5 patients who conducted a recitation task of Dutch utterances. We achieve correlations of up to 0.80 between original and reconstructed speech spectrograms, which are significantly above chance level for all patients (p < 0.001). Our results indicate that sEEG can yield similar speech decoding performance to prior ECoG studies and is a promising modality for speech BCIs.

摘要

神经紊乱会导致言语交流出现严重障碍，在严重的情况下，甚至会完全丧失言语能力。脑机接口作为一种替代的交流方式，已经显示出了很大的潜力，它可以直接将言语过程的神经活动转化为文本或可听的表示。以前的研究依赖于皮层表面的脑电（ECoG）或微电极阵列来获取神经信号，这些研究调查了这种言语神经假体。虽然这两种测量方法都成功地进行了言语解码，但它们都无法捕捉到来自大脑深层结构的活动，因此这些活动尚未被用于与言语相关的脑机接口。在这项研究中，我们通过将基于 ECoG 信号的语音合成的解码管道改编为植入的深部电极（sEEG）来弥补这一差距。为此，我们提出了一种多输入卷积神经网络，该网络可以为每个电极轴分别提取与语音相关的活动，并估计频谱系数以重建可听的波形。我们在 5 名患者的开环数据上评估了我们的方法，这些患者进行了荷兰语发音的背诵任务。我们实现了高达 0.80 的原始和重建语音频谱图之间的相关性，这对于所有患者都显著高于随机水平（p < 0.001）。我们的结果表明，sEEG 可以产生与之前的 ECoG 研究相似的言语解码性能，是一种很有前途的言语脑机接口模式。