Littlejohn Kaylo T, Cho Cheol Jun, Liu Jessie R, Silva Alexander B, Yu Bohan, Anderson Vanessa R, Kurtz-Miott Cady M, Brosler Samantha, Kashyap Anshul P, Hallinan Irina P, Shah Adit, Tu-Chan Adelyn, Ganguly Karunesh, Moses David A, Chang Edward F, Anumanchipalli Gopala K
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
Nat Neurosci. 2025 Apr;28(4):902-912. doi: 10.1038/s41593-025-01905-6. Epub 2025 Mar 31.
Natural spoken communication happens instantaneously. Speech delays longer than a few seconds can disrupt the natural flow of conversation. This makes it difficult for individuals with paralysis to participate in meaningful dialogue, potentially leading to feelings of isolation and frustration. Here we used high-density surface recordings of the speech sensorimotor cortex in a clinical trial participant with severe paralysis and anarthria to drive a continuously streaming naturalistic speech synthesizer. We designed and used deep learning recurrent neural network transducer models to achieve online large-vocabulary intelligible fluent speech synthesis personalized to the participant's preinjury voice with neural decoding in 80-ms increments. Offline, the models demonstrated implicit speech detection capabilities and could continuously decode speech indefinitely, enabling uninterrupted use of the decoder and further increasing speed. Our framework also successfully generalized to other silent-speech interfaces, including single-unit recordings and electromyography. Our findings introduce a speech-neuroprosthetic paradigm to restore naturalistic spoken communication to people with paralysis.
自然的口语交流是瞬间发生的。超过几秒的言语延迟会扰乱对话的自然流畅性。这使得瘫痪患者难以参与有意义的对话,可能导致孤独感和挫败感。在此,我们利用一名患有严重瘫痪和构音障碍的临床试验参与者的言语感觉运动皮层的高密度表面记录,来驱动一个持续流式传输的自然主义语音合成器。我们设计并使用深度学习循环神经网络变换器模型,以80毫秒的增量进行神经解码,实现了根据参与者受伤前的声音进行个性化的在线大词汇量可理解流畅语音合成。在离线状态下,这些模型展示了隐式语音检测能力,并且可以无限期地持续解码语音,从而实现解码器的不间断使用并进一步提高速度。我们的框架还成功推广到了其他无声语音接口,包括单细胞记录和肌电图。我们的研究结果引入了一种言语神经假体范式,以恢复瘫痪患者的自然口语交流。