Herff Christian, Schultz Tanja
Cognitive Systems Lab, Department for Mathematics and Computer Science, University of Bremen Bremen, Germany.
Front Neurosci. 2016 Sep 27;10:429. doi: 10.3389/fnins.2016.00429. eCollection 2016.
Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the system.
语音接口已被广泛接受,如今已集成到各种现实生活应用和设备中。它们已成为我们日常生活的一部分。然而,语音接口需要具备产生清晰可懂语音的能力,但由于环境嘈杂、会干扰旁人或无法发出语音(即患有闭锁综合征的患者),这可能无法实现。出于这些原因,非常希望不必说话,只需想象自己说出单词或句子即可。基于想象语音的接口将实现快速自然的交流,而无需可听语音,并能让原本无法发声的人发出声音。这篇重点综述分析了不同脑成像技术通过应用自动语音识别技术从神经信号中识别语音的潜力。我们认为,基于代谢过程的模态,如功能近红外光谱和功能磁共振成像,由于时间分辨率低,不太适合从神经信号中进行自动语音识别,但对于研究语音过程中涉及的潜在神经机制非常有用。相比之下,电生理活动速度足够快,能够捕捉语音过程,因此更适合自动语音识别。我们的实验结果表明了这些信号在从神经数据中进行语音识别方面的潜力,重点是侵入性测量的脑活动(皮层脑电图)。作为从神经信号中使用自动语音识别技术的第一个例子,我们讨论了该系统。