Guo Wenzhe, Fouda Mohammed E, Eltawil Ahmed M, Salama Khaled Nabil
Sensors Lab, Advanced Membranes and Porous Materials Center (AMPMC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
Communication and Computing Systems Lab, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
Front Neurosci. 2023 Apr 6;17:1047008. doi: 10.3389/fnins.2023.1047008. eCollection 2023.
Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to exploit the potential of locally-supervised training methods. This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm. The proposed algorithm explores both temporal and spatial locality in BPTT and contributes to significant reduction in computational cost including GPU memory utilization, main memory access and arithmetic operations. We thoroughly explore the design space concerning temporal truncation length and local training block size and benchmark their impact on classification accuracy of different networks running different types of tasks. The results reveal that temporal truncation has a negative effect on the accuracy of classifying frame-based datasets, but leads to improvement in accuracy on event-based datasets. In spite of resulting information loss, local training is capable of alleviating overfitting. The combined effect of temporal truncation and local training can lead to the slowdown of accuracy drop and even improvement in accuracy. In addition, training deep SNNs' models such as AlexNet classifying CIFAR10-DVS dataset leads to 7.26% increase in accuracy, 89.94% reduction in GPU memory, 10.79% reduction in memory access, and 99.64% reduction in MAC operations compared to the standard end-to-end BPTT. Thus, the proposed method has shown high potential to enable fast and energy-efficient on-chip training for real-time learning at the edge.
由于复杂的神经动力学和激发函数中固有的不可微性,直接训练脉冲神经网络(SNN)仍然具有挑战性。为训练SNN而提出的著名的时间反向传播(BPTT)算法存在内存占用大的问题,并且禁止反向传播和更新解锁,这使得无法利用局部监督训练方法的潜力。这项工作提出了一种高效的SNN直接训练算法,该算法将局部监督训练方法与时间截断的BPTT算法相结合。所提出的算法在BPTT中探索了时间和空间局部性,并有助于显著降低计算成本,包括GPU内存利用率、主内存访问和算术运算。我们全面探索了关于时间截断长度和局部训练块大小的设计空间,并对它们对运行不同类型任务的不同网络的分类准确率的影响进行了基准测试。结果表明,时间截断对基于帧的数据集的分类准确率有负面影响,但会提高基于事件的数据集的准确率。尽管会导致信息丢失,但局部训练能够减轻过拟合。时间截断和局部训练的综合效果可以导致准确率下降放缓,甚至准确率提高。此外,与标准的端到端BPTT相比,训练深度SNN模型(如对CIFAR10-DVS数据集进行分类的AlexNet)可使准确率提高7.26%,GPU内存减少89.94%,内存访问减少10.79%,乘法累加运算减少99.64%。因此,所提出的方法在实现边缘实时学习的快速且节能的片上训练方面显示出了很高的潜力。