用于语音识别的高效二进制权值卷积网络加速器。

Efficient Binary Weight Convolutional Network Accelerator for Speech Recognition.

机构信息

School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.

出版信息

Sensors (Basel). 2023 Jan 30;23(3):1530. doi: 10.3390/s23031530.

DOI:10.3390/s23031530

PMID:36772567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9920974/

Abstract

Speech recognition has progressed tremendously in the area of artificial intelligence (AI). However, the performance of the real-time offline Chinese speech recognition neural network accelerator for edge AI needs to be improved. This paper proposes a configurable convolutional neural network accelerator based on a lightweight speech recognition model, which can dramatically reduce hardware resource consumption while guaranteeing an acceptable error rate. For convolutional layers, the weights are binarized to reduce the number of model parameters and improve computational and storage efficiency. A multichannel shared computation (MCSC) architecture is proposed to maximize the reuse of weight and feature map data. The binary weight-sharing processing engine (PE) is designed to avoid limiting the number of multipliers. A custom instruction set is established according to the variable length of voice input to configure parameters for adapting to different network structures. Finally, the ping-pong storage method is used when the feature map is an input. We implemented this accelerator on Xilinx ZYNQ XC7Z035 under the working frequency of 150 MHz. The processing time for 2.24 s and 8 s of speech was 69.8 ms and 189.51 ms, respectively, and the convolution performance reached 35.66 GOPS/W. Compared with other computing platforms, accelerators perform better in terms of energy efficiency, power consumption and hardware resource consumption.

摘要

语音识别在人工智能（AI）领域取得了巨大进展。然而，边缘 AI 实时离线中文语音识别神经网络加速器的性能仍需提高。本文提出了一种基于轻量级语音识别模型的可配置卷积神经网络加速器，它可以在保证可接受错误率的同时，显著减少硬件资源消耗。对于卷积层，对权值进行二值化处理，以减少模型参数数量，提高计算和存储效率。提出了一种多通道共享计算（MCSC）架构，以最大化权重和特征图数据的重用。设计了二进制权值共享处理引擎（PE）来避免限制乘法器的数量。根据语音输入的可变长度，建立了定制指令集来配置参数，以适应不同的网络结构。最后，在特征图作为输入时使用乒乓存储方法。我们在工作频率为 150 MHz 的 Xilinx ZYNQ XC7Z035 上实现了这个加速器。对于 2.24 s 和 8 s 的语音，处理时间分别为 69.8 ms 和 189.51 ms，卷积性能达到 35.66 GOPS/W。与其他计算平台相比，该加速器在能效、功耗和硬件资源消耗方面表现更好。