Che Kaiwei, Zhou Zhaokun, Niu Jun, Ma Zhengyu, Fang Wei, Chen Yanqi, Shen Shuaijie, Yuan Li, Tian Yonghong
School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, Guangdong, China.
Peng Cheng Laboratory, Shenzhen, Guangdong, China.
Front Neurosci. 2024 Jul 23;18:1372257. doi: 10.3389/fnins.2024.1372257. eCollection 2024.
The integration of self-attention mechanisms into Spiking Neural Networks (SNNs) has garnered considerable interest in the realm of advanced deep learning, primarily due to their biological properties. Recent advancements in SNN architecture, such as Spikformer, have demonstrated promising outcomes. However, we observe that Spikformer may exhibit excessive energy consumption, potentially attributable to redundant channels and blocks.
To mitigate this issue, we propose a one-shot Spiking Transformer Architecture Search method, namely Auto-Spikformer. Auto-Spikformer extends the search space to include both transformer architecture and SNN inner parameters. We train and search the supernet based on weight entanglement, evolutionary search, and the proposed Discrete Spiking Parameters Search (DSPS) methods. Benefiting from these methods, the performance of subnets with weights inherited from the supernet without even retraining is comparable to the original Spikformer. Moreover, we propose a new fitness function aiming to find a Pareto optimal combination balancing energy consumption and accuracy.
Our experimental results demonstrate the effectiveness of Auto-Spikformer, which outperforms the original Spikformer and most CNN or ViT models with even fewer parameters and lower energy consumption.
将自注意力机制集成到脉冲神经网络(SNN)中在先进的深度学习领域引起了相当大的关注,主要是由于它们的生物学特性。SNN架构的最新进展,如Spikformer,已显示出有前景的结果。然而,我们观察到Spikformer可能表现出过度的能量消耗,这可能归因于冗余的通道和模块。
为了缓解这个问题,我们提出了一种一次性脉冲变压器架构搜索方法,即自动Spikformer。自动Spikformer扩展了搜索空间,以包括变压器架构和SNN内部参数。我们基于权重纠缠、进化搜索和提出的离散脉冲参数搜索(DSPS)方法来训练和搜索超网络。受益于这些方法,从超网络继承权重的子网甚至无需重新训练,其性能就与原始的Spikformer相当。此外,我们提出了一种新的适应度函数,旨在找到平衡能量消耗和准确性的帕累托最优组合。
我们的实验结果证明了自动Spikformer的有效性,它在参数更少、能量消耗更低的情况下优于原始的Spikformer以及大多数卷积神经网络(CNN)或视觉Transformer(ViT)模型。