Huang Jiaxin, Kelber Florian, Vogginger Bernhard, Liu Chen, Kreutz Felix, Gerhards Pascal, Scholz Daniel, Knobloch Klaus, Mayr Christian G
Infineon Technologies Dresden, Dresden, Germany.
Highly-Parallel VLSI-Systems and Neuro-Microelectronics, Faculty of Electrical and Computer Engineering, Institute of Principles of Electrical and Electronic Engineering, Technische Universität Dresden, Dresden, Germany.
Front Neurosci. 2023 Aug 7;17:1223262. doi: 10.3389/fnins.2023.1223262. eCollection 2023.
The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel architecture on each processing element (PE) of SpiNNaker 2, into the computational process of SNN inference. Based on the work of single-core optimization algorithms, we investigate the parallel acceleration algorithms for collaborating with multi-core MAC arrays. The proposed Echelon Reorder model information densification algorithm, along with the adapted multi-core two-stage splitting and authorization deployment strategies, achieves efficient spatio-temporal load balancing and optimization performance. We evaluate the performance by benchmarking a wide range of constructed SNN models to research on the influence degree of different factors. We also benchmark with two actual SNN models (the gesture recognition model of the real-world application and balanced random cortex-like network from neuroscience) on the neuromorphic multi-core hardware SpiNNaker 2. The echelon optimization algorithm with mixed processors realizes 74.28% and 85.78% memory footprint of the original MAC calculation on these two models, respectively. The execution time of echelon algorithms using only MAC or mixed processors accounts for ≤ 24.56% of the serial ARM baseline. Accelerating SNN inference with algorithms in this study is essentially the general sparse matrix-matrix multiplication (SpGEMM) problem. This article explicitly expands the application field of the SpGEMM issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array.
脉冲神经网络(SNN)潜在的低能耗特性引起了人工智能界的关注。在处理大型模型和海量数据集时,仅涉及CPU的SNN处理不可避免地会导致固有的长时间跨度。本研究将MAC阵列(SpiNNaker 2每个处理元件(PE)上的一种并行架构)引入到SNN推理的计算过程中。基于单核优化算法的工作,我们研究了与多核MAC阵列协作的并行加速算法。所提出的梯形重排序模型信息致密化算法,以及适配的多核两阶段拆分和授权部署策略,实现了高效的时空负载平衡和优化性能。我们通过对广泛构建的SNN模型进行基准测试来评估性能,以研究不同因素的影响程度。我们还在神经形态多核硬件SpiNNaker 2上使用两个实际的SNN模型(实际应用中的手势识别模型和来自神经科学的平衡随机皮质样网络)进行基准测试。在这两个模型上,采用混合处理器的梯形优化算法分别实现了原始MAC计算74.28%和85.78%的内存占用。仅使用MAC或混合处理器的梯形算法的执行时间占串行ARM基线的比例≤24.56%。用本研究中的算法加速SNN推理本质上是一般的稀疏矩阵-矩阵乘法(SpGEMM)问题。本文明确将SpGEMM问题的应用领域扩展到SNN,开发了适合SNN特性和MAC阵列的新型SpGEMM优化算法。