Suppr超能文献

从中央前回背侧的皮层内电极阵列解码英语口语。

Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus.

作者信息

Wilson Guy H, Stavisky Sergey D, Willett Francis R, Avansino Donald T, Kelemen Jessica N, Hochberg Leigh R, Henderson Jaimie M, Druckmann Shaul, Shenoy Krishna V

机构信息

Neurosciences Graduate Program, Stanford University, Stanford, CA, United States of America.

Department of Neurosurgery, Stanford University, Stanford, CA, United States of America.

出版信息

J Neural Eng. 2020 Nov 25;17(6):066007. doi: 10.1088/1741-2552/abbfef.

Abstract

OBJECTIVE

To evaluate the potential of intracortical electrode array signals for brain-computer interfaces (BCIs) to restore lost speech, we measured the performance of decoders trained to discriminate a comprehensive basis set of 39 English phonemes and to synthesize speech sounds via a neural pattern matching method. We decoded neural correlates of spoken-out-loud words in the 'hand knob' area of precentral gyrus, a step toward the eventual goal of decoding attempted speech from ventral speech areas in patients who are unable to speak.

APPROACH

Neural and audio data were recorded while two BrainGate2 pilot clinical trial participants, each with two chronically-implanted 96-electrode arrays, spoke 420 different words that broadly sampled English phonemes. Phoneme onsets were identified from audio recordings, and their identities were then classified from neural features consisting of each electrode's binned action potential counts or high-frequency local field potential power. Speech synthesis was performed using the 'Brain-to-Speech' pattern matching method. We also examined two potential confounds specific to decoding overt speech: acoustic contamination of neural signals and systematic differences in labeling different phonemes' onset times.

MAIN RESULTS

A linear decoder achieved up to 29.3% classification accuracy (chance = 6%) across 39 phonemes, while an RNN classifier achieved 33.9% accuracy. Parameter sweeps indicated that performance did not saturate when adding more electrodes or more training data, and that accuracy improved when utilizing time-varying structure in the data. Microphonic contamination and phoneme onset differences modestly increased decoding accuracy, but could be mitigated by acoustic artifact subtraction and using a neural speech onset marker, respectively. Speech synthesis achieved r = 0.523 correlation between true and reconstructed audio.

SIGNIFICANCE

The ability to decode speech using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.

摘要

目的

为了评估用于脑机接口(BCI)以恢复失能言语的皮质内电极阵列信号的潜力,我们测量了经过训练以区分39个英语音素的综合基集并通过神经模式匹配方法合成语音的解码器的性能。我们解码了中央前回“手旋钮”区域中大声说出单词的神经关联,这朝着解码无法说话患者腹侧言语区域中尝试性言语的最终目标迈进了一步。

方法

在两名BrainGate2试点临床试验参与者(每人有两个长期植入的96电极阵列)说出420个广泛采样英语音素的不同单词时,记录神经和音频数据。从音频记录中识别音素起始点,然后根据由每个电极的分箱动作电位计数或高频局部场电位功率组成的神经特征对其身份进行分类。使用“脑到语音”模式匹配方法进行语音合成。我们还研究了特定于解码公开言语的两个潜在混淆因素:神经信号的声学污染以及标记不同音素起始时间的系统差异。

主要结果

线性解码器在39个音素上实现了高达29.3%的分类准确率(机遇率 = 6%),而循环神经网络(RNN)分类器实现了33.9%的准确率。参数扫描表明,添加更多电极或更多训练数据时性能并未饱和,并且利用数据中的时变结构时准确率会提高。微音器污染和音素起始差异适度提高了解码准确率,但分别可以通过声学伪迹减法和使用神经语音起始标记来减轻。语音合成在真实音频和重建音频之间实现了r = 0.523的相关性。

意义

使用来自非传统言语区域的皮质内电极阵列信号解码言语的能力表明,将电极阵列放置在腹侧言语区域是言语脑机接口的一个有前景的方向。

相似文献

1
Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus.
J Neural Eng. 2020 Nov 25;17(6):066007. doi: 10.1088/1741-2552/abbfef.
2
Decoding Speech from Intracortical Multielectrode Arrays in Dorsal "Arm/Hand Areas" of Human Motor Cortex.
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:93-97. doi: 10.1109/EMBC.2018.8512199.
4
Effective Phoneme Decoding With Hyperbolic Neural Networks for High-Performance Speech BCIs.
IEEE Trans Neural Syst Rehabil Eng. 2024;32:3432-3441. doi: 10.1109/TNSRE.2024.3457313. Epub 2024 Sep 18.
6
Generalizing neural signal-to-text brain-computer interfaces.
Biomed Phys Eng Express. 2021 Apr 30;7(3). doi: 10.1088/2057-1976/abf6ab.
7
Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids.
Neuroimage. 2018 Oct 15;180(Pt A):301-311. doi: 10.1016/j.neuroimage.2017.10.011. Epub 2017 Oct 7.
8
Cortical encoding of phonemic context during word production.
Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:6790-3. doi: 10.1109/EMBC.2014.6945187.
9
An Accurate and Rapidly Calibrating Speech Neuroprosthesis.
N Engl J Med. 2024 Aug 15;391(7):609-618. doi: 10.1056/NEJMoa2314132.
10
Transformer-based neural speech decoding from surface and depth electrode signals.
J Neural Eng. 2025 Jan 28;22(1):016017. doi: 10.1088/1741-2552/adab21.

引用本文的文献

1
An instantaneous voice-synthesis neuroprosthesis.
Nature. 2025 Jun 12. doi: 10.1038/s41586-025-09127-3.
2
Encoding of speech modes and loudness in ventral precentral gyrus.
bioRxiv. 2025 May 31:2025.05.30.657105. doi: 10.1101/2025.05.30.657105.
3
Acoustic Inspired Brain-to-Sentence Decoder for Logosyllabic Language.
Cyborg Bionic Syst. 2025 Apr 29;6:0257. doi: 10.34133/cbsystems.0257. eCollection 2025.
4
Speech motor cortex enables BCI cursor control and click.
J Neural Eng. 2025 May 14;22(3). doi: 10.1088/1741-2552/add0e5.
5
Orbitofrontal High-Gamma Reflects Spike-Dissociable Value and Decision Mechanisms.
J Neurosci. 2025 May 14;45(20):e0789242025. doi: 10.1523/JNEUROSCI.0789-24.2025.
6
Decoding semantics from natural speech using human intracranial EEG.
bioRxiv. 2025 Feb 11:2025.02.10.637051. doi: 10.1101/2025.02.10.637051.
9
Speech motor cortex enables BCI cursor control and click.
bioRxiv. 2024 Nov 22:2024.11.12.623096. doi: 10.1101/2024.11.12.623096.
10
Tapping into the vocal learning and rhythmic synchronization hypothesis.
BMC Neurosci. 2024 Nov 6;25(1):63. doi: 10.1186/s12868-024-00863-2.

本文引用的文献

1
High-performance brain-to-text communication via handwriting.
Nature. 2021 May;593(7858):249-254. doi: 10.1038/s41586-021-03506-2. Epub 2021 May 12.
3
Human motor decoding from neural signals: a review.
BMC Biomed Eng. 2019 Sep 3;1:22. doi: 10.1186/s42490-019-0022-z. eCollection 2019.
4
A low-power band of neuronal spiking activity dominated by local single units improves the performance of brain-machine interfaces.
Nat Biomed Eng. 2020 Oct;4(10):973-983. doi: 10.1038/s41551-020-0591-0. Epub 2020 Jul 27.
5
Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals.
Front Neurosci. 2020 Apr 7;14:290. doi: 10.3389/fnins.2020.00290. eCollection 2020.
6
Extracting wavelet based neural features from human intracortical recordings for neuroprosthetics applications.
Bioelectron Med. 2018 Jul 31;4:11. doi: 10.1186/s42234-018-0011-x. eCollection 2018.
7
Machine translation of cortical activity to text with an encoder-decoder framework.
Nat Neurosci. 2020 Apr;23(4):575-582. doi: 10.1038/s41593-020-0608-8. Epub 2020 Mar 30.
8
Hand Knob Area of Premotor Cortex Represents the Whole Body in a Compositional Way.
Cell. 2020 Apr 16;181(2):396-409.e26. doi: 10.1016/j.cell.2020.02.043. Epub 2020 Mar 26.
9
The Potential of Stereotactic-EEG for Brain-Computer Interfaces: Current Progress and Future Directions.
Front Neurosci. 2020 Feb 27;14:123. doi: 10.3389/fnins.2020.00123. eCollection 2020.
10
Speech-related dorsal motor cortex activity does not interfere with iBCI cursor control.
J Neural Eng. 2020 Feb 5;17(1):016049. doi: 10.1088/1741-2552/ab5b72.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验