Herff Christian, Diener Lorenz, Angrick Miguel, Mugler Emily, Tate Matthew C, Goldrick Matthew A, Krusienski Dean J, Slutzky Marc W, Schultz Tanja
School of Mental Health & Neuroscience, Maastricht University, Maastricht, Netherlands.
Cognitive Systems Lab, University of Bremen, Bremen, Germany.
Front Neurosci. 2019 Nov 22;13:1267. doi: 10.3389/fnins.2019.01267. eCollection 2019.
Neural interfaces that directly produce intelligible speech from brain activity would allow people with severe impairment from neurological disorders to communicate more naturally. Here, we record neural population activity in motor, premotor and inferior frontal cortices during speech production using electrocorticography (ECoG) and show that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues. To produce speech directly from neural data, we adapted a method from the field of speech synthesis called unit selection, in which units of speech are concatenated to form audible output. In our approach, which we call , we chose subsequent units of speech based on the measured ECoG activity to generate audio waveforms directly from the neural recordings. employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation. By investigating the brain areas involved in speech production separately, we found that speech motor cortex provided more information for the reconstruction process than the other cortical areas.
能够直接根据大脑活动生成可理解语音的神经接口,将使患有严重神经系统疾病的人能够更自然地进行交流。在此,我们使用皮质脑电图(ECoG)记录了言语产生过程中运动、运动前区和额下回皮质的神经群体活动,并表明仅ECoG信号就可用于生成能保留对话线索的可理解语音输出。为了直接从神经数据中产生语音,我们采用了语音合成领域的一种称为单元选择的方法,即将语音单元拼接起来以形成可听输出。在我们称为 的方法中,我们根据测量到的ECoG活动选择后续语音单元,以直接从神经记录中生成音频波形。 使用用户自己的声音来生成听起来非常自然的语音,并且包含韵律和重音等特征。通过分别研究参与言语产生的脑区,我们发现言语运动皮层为重建过程提供的信息比其他皮质区域更多。