Wilson Guy H, Stavisky Sergey D, Willett Francis R, Avansino Donald T, Kelemen Jessica N, Hochberg Leigh R, Henderson Jaimie M, Druckmann Shaul, Shenoy Krishna V
Neurosciences Graduate Program, Stanford University, Stanford, CA, United States of America.
Department of Neurosurgery, Stanford University, Stanford, CA, United States of America.
J Neural Eng. 2020 Nov 25;17(6):066007. doi: 10.1088/1741-2552/abbfef.
To evaluate the potential of intracortical electrode array signals for brain-computer interfaces (BCIs) to restore lost speech, we measured the performance of decoders trained to discriminate a comprehensive basis set of 39 English phonemes and to synthesize speech sounds via a neural pattern matching method. We decoded neural correlates of spoken-out-loud words in the 'hand knob' area of precentral gyrus, a step toward the eventual goal of decoding attempted speech from ventral speech areas in patients who are unable to speak.
Neural and audio data were recorded while two BrainGate2 pilot clinical trial participants, each with two chronically-implanted 96-electrode arrays, spoke 420 different words that broadly sampled English phonemes. Phoneme onsets were identified from audio recordings, and their identities were then classified from neural features consisting of each electrode's binned action potential counts or high-frequency local field potential power. Speech synthesis was performed using the 'Brain-to-Speech' pattern matching method. We also examined two potential confounds specific to decoding overt speech: acoustic contamination of neural signals and systematic differences in labeling different phonemes' onset times.
A linear decoder achieved up to 29.3% classification accuracy (chance = 6%) across 39 phonemes, while an RNN classifier achieved 33.9% accuracy. Parameter sweeps indicated that performance did not saturate when adding more electrodes or more training data, and that accuracy improved when utilizing time-varying structure in the data. Microphonic contamination and phoneme onset differences modestly increased decoding accuracy, but could be mitigated by acoustic artifact subtraction and using a neural speech onset marker, respectively. Speech synthesis achieved r = 0.523 correlation between true and reconstructed audio.
The ability to decode speech using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.
为了评估用于脑机接口(BCI)以恢复失能言语的皮质内电极阵列信号的潜力,我们测量了经过训练以区分39个英语音素的综合基集并通过神经模式匹配方法合成语音的解码器的性能。我们解码了中央前回“手旋钮”区域中大声说出单词的神经关联,这朝着解码无法说话患者腹侧言语区域中尝试性言语的最终目标迈进了一步。
在两名BrainGate2试点临床试验参与者(每人有两个长期植入的96电极阵列)说出420个广泛采样英语音素的不同单词时,记录神经和音频数据。从音频记录中识别音素起始点,然后根据由每个电极的分箱动作电位计数或高频局部场电位功率组成的神经特征对其身份进行分类。使用“脑到语音”模式匹配方法进行语音合成。我们还研究了特定于解码公开言语的两个潜在混淆因素:神经信号的声学污染以及标记不同音素起始时间的系统差异。
线性解码器在39个音素上实现了高达29.3%的分类准确率(机遇率 = 6%),而循环神经网络(RNN)分类器实现了33.9%的准确率。参数扫描表明,添加更多电极或更多训练数据时性能并未饱和,并且利用数据中的时变结构时准确率会提高。微音器污染和音素起始差异适度提高了解码准确率,但分别可以通过声学伪迹减法和使用神经语音起始标记来减轻。语音合成在真实音频和重建音频之间实现了r = 0.523的相关性。
使用来自非传统言语区域的皮质内电极阵列信号解码言语的能力表明,将电极阵列放置在腹侧言语区域是言语脑机接口的一个有前景的方向。