Department of Physics and Center for Neural Engineering, The Pennsylvania State University, University Park, PA 16802, U.S.A.
Neural Comput. 2014 Mar;26(3):523-56. doi: 10.1162/NECO_a_00557. Epub 2013 Dec 9.
Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
在嘈杂环境下的语音识别对计算机系统来说是一个重大挑战,但人类大脑却能常规且准确地完成这项任务。受神经科学启发的自动语音识别(ASR)系统有可能弥合人类和机器之间的性能差距。我们提出了一种针对噪声鲁棒的孤立单词识别系统,该系统通过对模拟听觉特征检测神经元群体的尖峰序列进行解码来工作。每个神经元都经过训练,对从模拟听觉神经对语音的反应中提取的短暂的时频谱模式或特征做出选择性响应。神经元群体通过其尖峰序列来传递声音的时变结构。我们比较了两种解码尖峰序列的方法——一种使用基于隐马尔可夫模型的识别器,另一种使用基于新模板的识别方案。在后一种情况下,通过将其尖峰序列与从干净训练数据中获得的模板序列进行比较,使用基于最长公共子序列长度的相似性度量来识别单词。我们使用 AURORA-2 数据库中的孤立数字语音来证明,我们的组合系统在低信噪比下优于最先进的鲁棒语音识别器。基于尖峰的编码方案和基于模板的解码方法都比传统的语音识别方法具有更好的噪声鲁棒性。我们的系统突出了基于尖峰的声学编码的潜在优势,并为鲁棒 ASR 的发展提供了一个基于生物学的框架。