Liu Yuqian, Zhao Chujie, Jiang Yizhou, Fang Ying, Chen Feng
Department of Automation, Tsinghua University, Beijing 100084, China.
LSBDPA Beijing Key Laboratory, Beijing 100084, China.
Biomimetics (Basel). 2024 Jul 6;9(7):413. doi: 10.3390/biomimetics9070413.
The rise of large-scale Transformers has led to challenges regarding computational costs and energy consumption. In this context, spiking neural networks (SNNs) offer potential solutions due to their energy efficiency and processing speed. However, the inaccuracy of surrogate gradients and feature space quantization pose challenges for directly training deep SNN Transformers. To tackle these challenges, we propose a method (called LDD) to align ANN and SNN features across different abstraction levels in a Transformer network. LDD incorporates structured feature knowledge from ANNs to guide SNN training, ensuring the preservation of crucial information and addressing inaccuracies in surrogate gradients through designing layer-wise distillation losses. The proposed approach outperforms existing methods on the CIFAR10 (96.1%), CIFAR100 (82.3%), and ImageNet (80.9%) datasets, and enables training of the deepest SNN Transformer network using ImageNet.
大规模Transformer的兴起带来了计算成本和能源消耗方面的挑战。在这种背景下,脉冲神经网络(SNN)因其能源效率和处理速度而提供了潜在的解决方案。然而,替代梯度的不准确性和特征空间量化给直接训练深度SNN Transformer带来了挑战。为了应对这些挑战,我们提出了一种方法(称为LDD),以在Transformer网络中跨不同抽象级别对齐人工神经网络(ANN)和SNN的特征。LDD整合了来自ANN的结构化特征知识来指导SNN训练,通过设计逐层蒸馏损失来确保关键信息的保留并解决替代梯度中的不准确性问题。所提出的方法在CIFAR10(96.1%)、CIFAR100(82.3%)和ImageNet(80.9%)数据集上优于现有方法,并能够使用ImageNet训练最深的SNN Transformer网络。