Wu Jibin, Yılmaz Emre, Zhang Malu, Li Haizhou, Tan Kay Chen
Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore.
Faculty for Computer Science and Mathematics, University of Bremen, Bremen, Germany.
Front Neurosci. 2020 Mar 17;14:199. doi: 10.3389/fnins.2020.00199. eCollection 2020.
Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation. Motivated by their unprecedented energy-efficiency and rapid information processing capability, we explore the use of SNNs for speech recognition. In this work, we use SNNs for acoustic modeling and evaluate their performance on several large vocabulary recognition scenarios. The experimental results demonstrate competitive ASR accuracies to their ANN counterparts, while require only 10 algorithmic time steps and as low as 0.68 times total synaptic operations to classify each audio frame. Integrating the algorithmic power of deep SNNs with energy-efficient neuromorphic hardware, therefore, offer an attractive solution for ASR applications running locally on mobile and embedded devices.
人工神经网络(ANN)已成为大词汇量自动语音识别(ASR)的主流声学建模技术。传统的人工神经网络具有多层架构,需要大量计算。受大脑启发的脉冲神经网络(SNN)紧密模仿生物神经网络,并且可以通过基于脉冲的计算在低功耗神经形态硬件上运行。出于其前所未有的能源效率和快速信息处理能力的动机,我们探索将脉冲神经网络用于语音识别。在这项工作中,我们将脉冲神经网络用于声学建模,并在几个大词汇量识别场景中评估其性能。实验结果表明,其ASR准确率与人工神经网络相当,同时每个音频帧分类仅需10个算法时间步长,总突触操作低至0.68倍。因此,将深度脉冲神经网络的算法能力与节能神经形态硬件相结合,为在移动和嵌入式设备上本地运行的ASR应用提供了一个有吸引力的解决方案。