Weng Yui-Kai, Huang Shih-Hsu, Kao Hsu-Yu
Department of Electronic Engineering, Chung Yuan Christian University, Taoyuan 32023, Taiwan.
Sensors (Basel). 2021 Nov 10;21(22):7468. doi: 10.3390/s21227468.
In a CNN (convolutional neural network) accelerator, to reduce memory traffic and power consumption, there is a need to exploit the sparsity of activation values. Therefore, some research efforts have been paid to skip ineffectual computations (i.e., multiplications by zero). Different from previous works, in this paper, we point out the similarity of activation values: (1) in the same layer of a CNN model, most feature maps are either highly dense or highly sparse; (2) in the same layer of a CNN model, feature maps in different channels are often similar. Based on the two observations, we propose a block-based compression approach, which utilizes both the sparsity and the similarity of activation values to further reduce the data volume. Moreover, we also design an encoder, a decoder and an indexing module to support the proposed approach. The encoder is used to translate output activations into the proposed block-based compression format, while both the decoder and the indexing module are used to align nonzero values for effectual computations. Compared with previous works, benchmark data consistently show that the proposed approach can greatly reduce both memory traffic and power consumption.
在卷积神经网络(CNN)加速器中,为了减少内存流量和功耗,需要利用激活值的稀疏性。因此,已经开展了一些研究工作来跳过无效计算(即与零相乘)。与之前的工作不同,在本文中,我们指出了激活值的相似性:(1)在CNN模型的同一层中,大多数特征图要么高度密集,要么高度稀疏;(2)在CNN模型的同一层中,不同通道的特征图通常相似。基于这两个观察结果,我们提出了一种基于块的压缩方法,该方法利用激活值的稀疏性和相似性来进一步减少数据量。此外,我们还设计了一个编码器、一个解码器和一个索引模块来支持所提出的方法。编码器用于将输出激活转换为所提出的基于块的压缩格式,而解码器和索引模块都用于对齐非零值以进行有效计算。与之前的工作相比,基准数据一致表明,所提出的方法可以大大减少内存流量和功耗。