Srinivasan Gopalakrishnan, Roy Kaushik
Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States.
Front Neurosci. 2021 Oct 29;15:603433. doi: 10.3389/fnins.2021.603433. eCollection 2021.
Spiking neural networks (SNNs), with their inherent capability to learn sparse spike-based input representations over time, offer a promising solution for enabling the next generation of intelligent autonomous systems. Nevertheless, end-to-end training of deep SNNs is both compute- and memory-intensive because of the need to backpropagate error gradients through time. We propose BlocTrain, which is a scalable and complexity-aware incremental algorithm for memory-efficient training of deep SNNs. We divide a deep SNN into blocks, where each block consists of few convolutional layers followed by a classifier. We train the blocks sequentially using local errors from the classifier. Once a given block is trained, our algorithm dynamically figures out easy vs. hard classes using the class-wise accuracy, and trains the deeper block only on the hard class inputs. In addition, we also incorporate a hard class detector (HCD) per block that is used during inference to exit early for the easy class inputs and activate the deeper blocks only for the hard class inputs. We trained ResNet-9 SNN divided into three blocks, using BlocTrain, on CIFAR-10 and obtained 86.4% accuracy, which is achieved with up to 2.95× lower memory requirement during the course of training, and 1.89× compute efficiency per inference (due to early exit strategy) with 1.45× memory overhead (primarily due to classifier weights) compared to end-to-end network. We also trained ResNet-11, divided into four blocks, on CIFAR-100 and obtained 58.21% accuracy, which is one of the first reported accuracy for SNN trained entirely with spike-based backpropagation on CIFAR-100.
脉冲神经网络(SNN)具有随着时间学习基于稀疏脉冲的输入表示的内在能力,为实现下一代智能自主系统提供了一个有前景的解决方案。然而,深度SNN的端到端训练在计算和内存方面都很密集,因为需要通过时间反向传播误差梯度。我们提出了BlocTrain,这是一种用于深度SNN的内存高效训练的可扩展且复杂度感知的增量算法。我们将深度SNN划分为多个块,每个块由几个卷积层后跟一个分类器组成。我们使用来自分类器的局部误差顺序训练这些块。一旦给定的块被训练,我们的算法使用逐类准确率动态区分简单类和困难类,并仅在困难类输入上训练更深的块。此外,我们还为每个块合并了一个困难类检测器(HCD),该检测器在推理期间用于对简单类输入提前退出,仅对困难类输入激活更深的块。我们使用BlocTrain在CIFAR-10上训练了分为三个块的ResNet-9 SNN,获得了86.4%的准确率,在训练过程中内存需求降低了2.95倍,每次推理的计算效率提高了1.89倍(由于早期退出策略),与端到端网络相比内存开销增加了1.45倍(主要由于分类器权重)。我们还在CIFAR-100上训练了分为四个块的ResNet-11,获得了58.21%的准确率,这是首次在CIFAR-100上完全使用基于脉冲的反向传播训练的SNN所报告的准确率之一。