Qiu Haonan, Ning Munan, Song Zeyin, Fang Wei, Chen Yanqi, Sun Tao, Ma Zhengyu, Yuan Li, Tian Yonghong
Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China.
Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China.
Neural Netw. 2024 Oct;178:106475. doi: 10.1016/j.neunet.2024.106475. Epub 2024 Jun 19.
Spiking neural networks (SNNs) have attracted attention due to their biological plausibility and the potential for low-energy applications on neuromorphic hardware. Two mainstream approaches are commonly used to obtain SNNs, i.e., ANN-to-SNN conversion methods, and Directly-trained-SNN methods. However, the former achieve excellent performance at the cost of a large number of time steps (i.e., latency), while the latter exhibit lower latency but suffers from suboptimal performance. To tackle the performance-latency trade-off, we propose Self-Architectural Knowledge Distillation (SAKD), an intuitive and effective method for SNNs leveraging Knowledge Distillation (KD). We adopt a bilevel teacher-student training strategy in SAKD, i.e., level-1 involves directly transferring same-architectural pre-trained ANN weights to SNNs, and level-2 encourages the SNNs to mimic ANN's behavior, considering both final responses and intermediate features aspects. Learning with informative supervision signals fostered by labels and ANNs, our SAKD achieves new state-of-the-art (SOTA) performance with a few time steps on widely-used classification benchmark datasets. On ImageNet-1K, with only 4 time steps, our Spiking-ResNet34 model attains a Top-1 accuracy of 70.04%, outperforming the previous same-architectural SOTA methods. Notably, our SEW-ResNet152 model reaches a Top-1 accuracy of 77.30% on ImageNet-1K, setting a new SOTA benchmark for SNNs. Furthermore, we apply our SAKD to various dense prediction downstream tasks, such as object detection and semantic segmentation, demonstrating strong generalization ability and superior performance. In conclusion, our proposed SAKD framework presents a promising approach for achieving both high performance and low latency in SNNs, potentially paving the way for future advancements in the field.
脉冲神经网络(SNNs)因其生物学合理性以及在神经形态硬件上实现低能耗应用的潜力而备受关注。通常有两种主流方法来获得SNN,即人工神经网络到SNN的转换方法,以及直接训练SNN的方法。然而,前者以大量时间步长(即延迟)为代价实现了优异的性能,而后者延迟较低,但性能欠佳。为了解决性能与延迟之间的权衡问题,我们提出了自架构知识蒸馏(SAKD),这是一种利用知识蒸馏(KD)的直观且有效的SNN方法。我们在SAKD中采用了双层师生训练策略,即一级涉及将相同架构的预训练人工神经网络权重直接转移到SNN中,二级鼓励SNN模仿人工神经网络的行为,同时考虑最终响应和中间特征两个方面。通过标签和人工神经网络产生的信息丰富的监督信号进行学习,我们的SAKD在广泛使用的分类基准数据集上以较少的时间步长实现了新的最优性能(SOTA)。在ImageNet-1K上,我们的脉冲ResNet34模型仅需4个时间步长,就达到了70.04%的Top-1准确率,超过了之前相同架构的最优方法。值得注意的是,我们的SEW-ResNet152模型在ImageNet-1K上达到了77.30%的Top-1准确率,为SNN设定了新的最优基准。此外,我们将SAKD应用于各种密集预测下游任务,如目标检测和语义分割,展示了强大的泛化能力和卓越性能。总之,我们提出的SAKD框架为在SNN中实现高性能和低延迟提供了一种很有前景的方法,可能为该领域的未来发展铺平道路。