脑到文本：从大脑中的语音表征解码口语短语。

Brain-to-text: decoding spoken phrases from phone representations in the brain.

作者信息

Herff Christian, Heger Dominic, de Pesters Adriana, Telaar Dominic, Brunner Peter, Schalk Gerwin, Schultz Tanja

机构信息

Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany.

New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Biomedical Sciences, State University of New York at Albany Albany, NY, USA.

出版信息

Front Neurosci. 2015 Jun 12;9:217. doi: 10.3389/fnins.2015.00217. eCollection 2015.

DOI:10.3389/fnins.2015.00217

PMID:26124702

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4464168/

Abstract

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.

摘要

长期以来，人们一直在猜测基于自然语音相关皮层活动的人机通信是否可行。在过去十年中，研究表明从神经信号中识别语音的孤立方面是可行的，例如听觉特征、音素或少数几个孤立单词之一。然而，到目前为止，从与语音和语言处理相关的神经基质中解码连续说出的语音仍然是一个未解决的挑战。在这里，我们首次表明，连续说出的语音可以从颅内皮层脑电图（ECoG）记录中解码为所表达的单词。具体来说，我们实现了一个系统，我们称之为“脑到文本”，它对单个音素进行建模，采用自动语音识别（ASR）技术，从而将说话时的大脑活动转换为相应的文本表示。我们的结果表明，我们的系统可以实现低至25%的单词错误率和低于50%的音素错误率。此外，我们的方法通过识别那些包含有关单个音素的大量信息的皮层区域，有助于当前对连续语音产生的神经基础的理解。总之，本文中描述的“脑到文本”系统代表了基于想象语音的人机通信迈出的重要一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3881/4464168/ee1cfdff2a01/fnins-09-00217-g0001.jpg

相似文献

Brain-to-text: decoding spoken phrases from phone representations in the brain.脑到文本：从大脑中的语音表征解码口语短语。

Front Neurosci. 2015 Jun 12;9:217. doi: 10.3389/fnins.2015.00217. eCollection 2015.

Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids.利用高密度 ECoG 网格从感觉运动皮层解码口语音素。

Neuroimage. 2018 Oct 15;180(Pt A):301-311. doi: 10.1016/j.neuroimage.2017.10.011. Epub 2017 Oct 7.

The Potential for a Speech Brain-Computer Interface Using Chronic Electrocorticography.利用慢性皮层脑电图实现语音脑-机接口的潜力

Neurotherapeutics. 2019 Jan;16(1):144-165. doi: 10.1007/s13311-018-00692-2.

Brain2Char: a deep architecture for decoding text from brain recordings.脑到字符：一种从脑记录中解码文本的深度架构。

J Neural Eng. 2020 Dec 16;17(6). doi: 10.1088/1741-2552/abc742.

Spatiotemporal target selection for intracranial neural decoding of abstract and concrete semantics.对抽象和具体语义的颅内神经解码的时空目标选择。

Cereb Cortex. 2022 Dec 8;32(24):5544-5554. doi: 10.1093/cercor/bhac034.

Machine translation of cortical activity to text with an encoder-decoder framework.基于编解码器框架的皮质活动文本机器翻译。

Nat Neurosci. 2020 Apr;23(4):575-582. doi: 10.1038/s41593-020-0608-8. Epub 2020 Mar 30.

Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals.从无创神经（脑磁图）信号中解码想象和说出的短语。

Front Neurosci. 2020 Apr 7;14:290. doi: 10.3389/fnins.2020.00290. eCollection 2020.

The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise.在收听有噪声干扰的语音时，从自动语音识别器的可视文本显示中获得的益处。

Ear Hear. 2008 Dec;29(6):838-52. doi: 10.1097/AUD.0b013e31818005bd.

Decoding speech using the timing of neural signal modulation.利用神经信号调制的时间来解码语音。

Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:1532-1535. doi: 10.1109/EMBC.2016.7591002.

Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity.神经语音识别：利用人类皮层活动的时空表征进行连续音素解码。

J Neural Eng. 2016 Oct;13(5):056004. doi: 10.1088/1741-2560/13/5/056004. Epub 2016 Aug 3.

引用本文的文献

Shared latent representations of speech production for cross-patient speech decoding.用于跨患者语音解码的语音产生共享潜在表征。

bioRxiv. 2025 Aug 22:2025.08.21.671516. doi: 10.1101/2025.08.21.671516.

Acoustic Inspired Brain-to-Sentence Decoder for Logosyllabic Language.用于标识音节语言的声学启发式脑到句子解码器

Cyborg Bionic Syst. 2025 Apr 29;6:0257. doi: 10.34133/cbsystems.0257. eCollection 2025.

A streaming brain-to-voice neuroprosthesis to restore naturalistic communication.一种用于恢复自然交流的流式脑到语音神经假体。

Nat Neurosci. 2025 Apr;28(4):902-912. doi: 10.1038/s41593-025-01905-6. Epub 2025 Mar 31.

Adaptive GCN and Bi-GRU-Based Dual Branch for Motor Imagery EEG Decoding.基于自适应图卷积网络和双向门控循环单元的双分支运动想象脑电信号解码方法

Sensors (Basel). 2025 Feb 13;25(4):1147. doi: 10.3390/s25041147.

Decoding speech intent from non-frontal cortical areas.从非额叶皮质区域解码言语意图。

J Neural Eng. 2025 Feb 13;22(1):016024. doi: 10.1088/1741-2552/adaa20.

Patterned electrical brain stimulation by a wireless network of implantable microdevices.通过无线网络的植入式微器件进行模式化电脑刺激。

Nat Commun. 2024 Nov 21;15(1):10093. doi: 10.1038/s41467-024-54542-1.

Rethinking the Methods and Algorithms for Inner Speech Decoding and Making Them Reproducible.重新思考内心言语解码的方法和算法并使其具有可重复性。

NeuroSci. 2022 Apr 19;3(2):226-244. doi: 10.3390/neurosci3020017. eCollection 2022 Jun.

Real-time detection of spoken speech from unlabeled ECoG signals: A pilot study with an ALS participant.从未标记的脑皮层电图信号中实时检测语音：对一名肌萎缩侧索硬化症患者的初步研究。

medRxiv. 2024 Sep 22:2024.09.18.24313755. doi: 10.1101/2024.09.18.24313755.

Iterative alignment discovery of speech-associated neural activity.语音相关神经活动的迭代对齐发现。

J Neural Eng. 2024 Aug 28;21(4):046056. doi: 10.1088/1741-2552/ad663c.

An Accurate and Rapidly Calibrating Speech Neuroprosthesis.一种精确且快速校准的语音神经假体。

N Engl J Med. 2024 Aug 15;391(7):609-618. doi: 10.1056/NEJMoa2314132.

本文引用的文献

Electrocorticographic representations of segmental features in continuous speech.连续言语中节段特征的皮质脑电图表现。

Front Hum Neurosci. 2015 Feb 24;9:97. doi: 10.3389/fnhum.2015.00097. eCollection 2015.

Cortical encoding of phonemic context during word production.单词生成过程中语音语境的皮层编码。

Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:6790-3. doi: 10.1109/EMBC.2014.6945187.

Neural decoding of spoken vowels from human sensory-motor cortex with high-density electrocorticography.利用高密度皮层脑电图从人类感觉运动皮层对语音元音进行神经解码。

Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:6782-5. doi: 10.1109/EMBC.2014.6945185.

NeuralAct: A Tool to Visualize Electrocortical (ECoG) Activity on a Three-Dimensional Model of the Cortex.NeuralAct：一种在三维皮质模型上可视化皮质脑电图（ECoG）活动的工具。

Neuroinformatics. 2015 Apr;13(2):167-74. doi: 10.1007/s12021-014-9252-3.

Decoding spectrotemporal features of overt and covert speech from the human cortex.从人类大脑皮层解码公开和隐蔽言语的频谱时间特征。

Front Neuroeng. 2014 May 27;7:14. doi: 10.3389/fneng.2014.00014. eCollection 2014.

Direct classification of all American English phonemes using signals from functional speech motor cortex.利用功能性言语运动皮层的信号对所有美式英语音素进行直接分类。

J Neural Eng. 2014 Jun;11(3):035015. doi: 10.1088/1741-2560/11/3/035015. Epub 2014 May 19.

Phonetic feature encoding in human superior temporal gyrus.人类上颞回中的语音特征编码。

Science. 2014 Feb 28;343(6174):1006-10. doi: 10.1126/science.1245994. Epub 2014 Jan 30.

Functional organization of human sensorimotor cortex for speech articulation.人类运动言语感知皮层的功能组织。

Nature. 2013 Mar 21;495(7441):327-32. doi: 10.1038/nature11911. Epub 2013 Feb 20.

The tracking of speech envelope in the human cortex.人类大脑皮层中语音包络的追踪。

PLoS One. 2013;8(1):e53398. doi: 10.1371/journal.pone.0053398. Epub 2013 Jan 10.

Temporal evolution of gamma activity in human cortex during an overt and covert word repetition task.在显性和隐性单词重复任务期间人类皮质中伽马活动的时间演变。

Front Hum Neurosci. 2012 May 3;6:99. doi: 10.3389/fnhum.2012.00099. eCollection 2012.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

脑到文本：从大脑中的语音表征解码口语短语。

Brain-to-text: decoding spoken phrases from phone representations in the brain.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献