文献检索，用中文搜 PubMed

The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel architecture on each processing element (PE) of SpiNNaker 2, into the computational process of SNN inference. Based on the work of single-core optimization algorithms, we investigate the parallel acceleration algorithms for collaborating with multi-core MAC arrays. The proposed Echelon Reorder model information densification algorithm, along with the adapted multi-core two-stage splitting and authorization deployment strategies, achieves efficient spatio-temporal load balancing and optimization performance. We evaluate the performance by benchmarking a wide range of constructed SNN models to research on the influence degree of different factors. We also benchmark with two actual SNN models (the gesture recognition model of the real-world application and balanced random cortex-like network from neuroscience) on the neuromorphic multi-core hardware SpiNNaker 2. The echelon optimization algorithm with mixed processors realizes 74.28% and 85.78% memory footprint of the original MAC calculation on these two models, respectively. The execution time of echelon algorithms using only MAC or mixed processors accounts for ≤ 24.56% of the serial ARM baseline. Accelerating SNN inference with algorithms in this study is essentially the general sparse matrix-matrix multiplication (SpGEMM) problem. This article explicitly expands the application field of the SpGEMM issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array.

脉冲神经网络（SNN）潜在的低能耗特性引起了人工智能界的关注。在处理大型模型和海量数据集时，仅涉及CPU的SNN处理不可避免地会导致固有的长时间跨度。本研究将MAC阵列（SpiNNaker 2每个处理元件（PE）上的一种并行架构）引入到SNN推理的计算过程中。基于单核优化算法的工作，我们研究了与多核MAC阵列协作的并行加速算法。所提出的梯形重排序模型信息致密化算法，以及适配的多核两阶段拆分和授权部署策略，实现了高效的时空负载平衡和优化性能。我们通过对广泛构建的SNN模型进行基准测试来评估性能，以研究不同因素的影响程度。我们还在神经形态多核硬件SpiNNaker 2上使用两个实际的SNN模型（实际应用中的手势识别模型和来自神经科学的平衡随机皮质样网络）进行基准测试。在这两个模型上，采用混合处理器的梯形优化算法分别实现了原始MAC计算74.28%和85.78%的内存占用。仅使用MAC或混合处理器的梯形算法的执行时间占串行ARM基线的比例≤24.56%。用本研究中的算法加速SNN推理本质上是一般的稀疏矩阵-矩阵乘法（SpGEMM）问题。本文明确将SpGEMM问题的应用领域扩展到SNN，开发了适合SNN特性和MAC阵列的新型SpGEMM优化算法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于SpiNNaker 2的高效脉冲神经网络多核乘法累加阵列加速

Efficient SNN multi-cores MAC array acceleration on SpiNNaker 2.

作者信息

机构信息

出版信息

相似文献

本文引用的文献