Suppr超能文献

脑到文本:从大脑中的语音表征解码口语短语。

Brain-to-text: decoding spoken phrases from phone representations in the brain.

作者信息

Herff Christian, Heger Dominic, de Pesters Adriana, Telaar Dominic, Brunner Peter, Schalk Gerwin, Schultz Tanja

机构信息

Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany.

New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Biomedical Sciences, State University of New York at Albany Albany, NY, USA.

出版信息

Front Neurosci. 2015 Jun 12;9:217. doi: 10.3389/fnins.2015.00217. eCollection 2015.

Abstract

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.

摘要

长期以来,人们一直在猜测基于自然语音相关皮层活动的人机通信是否可行。在过去十年中,研究表明从神经信号中识别语音的孤立方面是可行的,例如听觉特征、音素或少数几个孤立单词之一。然而,到目前为止,从与语音和语言处理相关的神经基质中解码连续说出的语音仍然是一个未解决的挑战。在这里,我们首次表明,连续说出的语音可以从颅内皮层脑电图(ECoG)记录中解码为所表达的单词。具体来说,我们实现了一个系统,我们称之为“脑到文本”,它对单个音素进行建模,采用自动语音识别(ASR)技术,从而将说话时的大脑活动转换为相应的文本表示。我们的结果表明,我们的系统可以实现低至25%的单词错误率和低于50%的音素错误率。此外,我们的方法通过识别那些包含有关单个音素的大量信息的皮层区域,有助于当前对连续语音产生的神经基础的理解。总之,本文中描述的“脑到文本”系统代表了基于想象语音的人机通信迈出的重要一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3881/4464168/ee1cfdff2a01/fnins-09-00217-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验