Center of Integrative Neurosciences, University of California, San Francisco, CA, United States of America.
These authors contributed equally to this work.
J Neural Eng. 2020 Dec 16;17(6). doi: 10.1088/1741-2552/abc742.
Decoding language representations directly from the brain can enable new brain-computer interfaces (BCIs) for high bandwidth human-human and human-machine communication. Clinically, such technologies can restore communication in people with neurological conditions affecting their ability to speak.. In this study, we propose a novel deep network architecture Brain2Char, for directly decoding text (specifically character sequences) from direct brain recordings (called electrocorticography, ECoG). Brain2Char framework combines state-of-the-art deep learning modules-3D Inception layers for multiband spatiotemporal feature extraction from neural data and bidirectional recurrent layers, dilated convolution layers followed by language model weighted beam search to decode character sequences, and optimizing a connectionist temporal classification loss. Additionally, given the highly non-linear transformations that underlie the conversion of cortical function to character sequences, we perform regularizations on the network's latent representations motivated by insights into cortical encoding of speech production and artifactual aspects specific to ECoG data acquisition. To do this, we impose auxiliary losses on latent representations for articulatory movements, speech acoustics and session specific non-linearities.In three (out of four) participants reported here, Brain2Char achieves 10.6%, 8.5%, and 7.0% word error rates respectively on vocabulary sizes ranging from 1200 to 1900 words.These results establish a newon decoding text fromand demonstrate the potential of Brain2Char as a high-performance communication BCI.
直接从大脑中解码语言表示可以为高带宽的人机和人人通信启用新的脑机接口 (BCI)。从临床角度来看,此类技术可以恢复影响说话能力的神经状况患者的沟通能力。在这项研究中,我们提出了一种新颖的深度网络架构 Brain2Char,用于直接从直接的大脑记录(称为皮层电图,ECoG)中解码文本(特别是字符序列)。Brain2Char 框架结合了最先进的深度学习模块-3D Inception 层,用于从神经数据中提取多频带时空特征,以及双向递归层、扩张卷积层,然后是语言模型加权波束搜索来解码字符序列,并优化连接时间分类损失。此外,鉴于皮质功能转换为字符序列的高度非线性变换,我们根据对言语产生的皮质编码的深入了解以及 ECoG 数据采集特有的人为方面,对网络的潜在表示进行正则化。为此,我们对发音运动、语音声学和特定于会话的非线性的潜在表示施加辅助损失。在报告的四个参与者中的三个中,Brain2Char 在词汇量从 1200 到 1900 个单词的范围内分别实现了 10.6%、8.5%和 7.0%的单词错误率。这些结果建立了一个新的从解码文本,并展示了 Brain2Char 作为高性能通信 BCI 的潜力。