• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于SpiNNaker 2的高效脉冲神经网络多核乘法累加阵列加速

Efficient SNN multi-cores MAC array acceleration on SpiNNaker 2.

作者信息

Huang Jiaxin, Kelber Florian, Vogginger Bernhard, Liu Chen, Kreutz Felix, Gerhards Pascal, Scholz Daniel, Knobloch Klaus, Mayr Christian G

机构信息

Infineon Technologies Dresden, Dresden, Germany.

Highly-Parallel VLSI-Systems and Neuro-Microelectronics, Faculty of Electrical and Computer Engineering, Institute of Principles of Electrical and Electronic Engineering, Technische Universität Dresden, Dresden, Germany.

出版信息

Front Neurosci. 2023 Aug 7;17:1223262. doi: 10.3389/fnins.2023.1223262. eCollection 2023.

DOI:10.3389/fnins.2023.1223262
PMID:37609449
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10440698/
Abstract

The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel architecture on each processing element (PE) of SpiNNaker 2, into the computational process of SNN inference. Based on the work of single-core optimization algorithms, we investigate the parallel acceleration algorithms for collaborating with multi-core MAC arrays. The proposed Echelon Reorder model information densification algorithm, along with the adapted multi-core two-stage splitting and authorization deployment strategies, achieves efficient spatio-temporal load balancing and optimization performance. We evaluate the performance by benchmarking a wide range of constructed SNN models to research on the influence degree of different factors. We also benchmark with two actual SNN models (the gesture recognition model of the real-world application and balanced random cortex-like network from neuroscience) on the neuromorphic multi-core hardware SpiNNaker 2. The echelon optimization algorithm with mixed processors realizes 74.28% and 85.78% memory footprint of the original MAC calculation on these two models, respectively. The execution time of echelon algorithms using only MAC or mixed processors accounts for ≤ 24.56% of the serial ARM baseline. Accelerating SNN inference with algorithms in this study is essentially the general sparse matrix-matrix multiplication (SpGEMM) problem. This article explicitly expands the application field of the SpGEMM issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array.

摘要

脉冲神经网络(SNN)潜在的低能耗特性引起了人工智能界的关注。在处理大型模型和海量数据集时,仅涉及CPU的SNN处理不可避免地会导致固有的长时间跨度。本研究将MAC阵列(SpiNNaker 2每个处理元件(PE)上的一种并行架构)引入到SNN推理的计算过程中。基于单核优化算法的工作,我们研究了与多核MAC阵列协作的并行加速算法。所提出的梯形重排序模型信息致密化算法,以及适配的多核两阶段拆分和授权部署策略,实现了高效的时空负载平衡和优化性能。我们通过对广泛构建的SNN模型进行基准测试来评估性能,以研究不同因素的影响程度。我们还在神经形态多核硬件SpiNNaker 2上使用两个实际的SNN模型(实际应用中的手势识别模型和来自神经科学的平衡随机皮质样网络)进行基准测试。在这两个模型上,采用混合处理器的梯形优化算法分别实现了原始MAC计算74.28%和85.78%的内存占用。仅使用MAC或混合处理器的梯形算法的执行时间占串行ARM基线的比例≤24.56%。用本研究中的算法加速SNN推理本质上是一般的稀疏矩阵-矩阵乘法(SpGEMM)问题。本文明确将SpGEMM问题的应用领域扩展到SNN,开发了适合SNN特性和MAC阵列的新型SpGEMM优化算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2af/10440698/52af22a483ff/fnins-17-1223262-g0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2af/10440698/4bcc47c08d43/fnins-17-1223262-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2af/10440698/31e8e6bce990/fnins-17-1223262-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2af/10440698/52af22a483ff/fnins-17-1223262-g0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2af/10440698/4bcc47c08d43/fnins-17-1223262-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2af/10440698/31e8e6bce990/fnins-17-1223262-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2af/10440698/52af22a483ff/fnins-17-1223262-g0011.jpg

相似文献

1
Efficient SNN multi-cores MAC array acceleration on SpiNNaker 2.基于SpiNNaker 2的高效脉冲神经网络多核乘法累加阵列加速
Front Neurosci. 2023 Aug 7;17:1223262. doi: 10.3389/fnins.2023.1223262. eCollection 2023.
2
sPyNNaker: A Software Package for Running PyNN Simulations on SpiNNaker.sPyNNaker:一个用于在SpiNNaker上运行PyNN模拟的软件包。
Front Neurosci. 2018 Nov 20;12:816. doi: 10.3389/fnins.2018.00816. eCollection 2018.
3
Benchmarking Highly Parallel Hardware for Spiking Neural Networks in Robotics.用于机器人中脉冲神经网络的高度并行硬件基准测试
Front Neurosci. 2021 Jun 29;15:667011. doi: 10.3389/fnins.2021.667011. eCollection 2021.
4
E-prop on SpiNNaker 2: Exploring online learning in spiking RNNs on neuromorphic hardware.SpiNNaker 2上的E-prop:探索神经形态硬件上脉冲循环神经网络中的在线学习。
Front Neurosci. 2022 Nov 28;16:1018006. doi: 10.3389/fnins.2022.1018006. eCollection 2022.
5
Memory-Efficient Deep Learning on a SpiNNaker 2 Prototype.基于SpiNNaker 2原型的内存高效深度学习
Front Neurosci. 2018 Nov 16;12:840. doi: 10.3389/fnins.2018.00840. eCollection 2018.
6
A 510 μW 0.738-mm 6.2-pJ/SOP Online Learning Multi-Topology SNN Processor With Unified Computation Engine in 40-nm CMOS.一款 510μW、0.738mm²、6.2pJ/SOP 的 40nm CMOS 在线学习多拓扑结构 SNN 处理器,具有统一的计算引擎。
IEEE Trans Biomed Circuits Syst. 2023 Jun;17(3):507-520. doi: 10.1109/TBCAS.2023.3279367. Epub 2023 Jul 12.
7
SSTDP: Supervised Spike Timing Dependent Plasticity for Efficient Spiking Neural Network Training.SSTDP:用于高效脉冲神经网络训练的监督式脉冲时间依赖可塑性
Front Neurosci. 2021 Nov 4;15:756876. doi: 10.3389/fnins.2021.756876. eCollection 2021.
8
Real-time cortical simulation on neuromorphic hardware.实时皮质模拟的神经形态硬件。
Philos Trans A Math Phys Eng Sci. 2020 Feb 7;378(2164):20190160. doi: 10.1098/rsta.2019.0160. Epub 2019 Dec 23.
9
GPUs Outperform Current HPC and Neuromorphic Solutions in Terms of Speed and Energy When Simulating a Highly-Connected Cortical Model.在模拟高度连接的皮质模型时,图形处理器(GPU)在速度和能源方面优于当前的高性能计算(HPC)和神经形态解决方案。
Front Neurosci. 2018 Dec 12;12:941. doi: 10.3389/fnins.2018.00941. eCollection 2018.
10
Spike-Based Approximate Backpropagation Algorithm of Brain-Inspired Deep SNN for Sonar Target Classification.基于脑启发深度 SNN 的声纳目标分类的尖峰近似反向传播算法。
Comput Intell Neurosci. 2022 Oct 20;2022:1633946. doi: 10.1155/2022/1633946. eCollection 2022.

本文引用的文献

1
A 16-Channel Fully Configurable Neural SoC With 1.52 μW/Ch Signal Acquisition, 2.79 μW/Ch Real-Time Spike Classifier, and 1.79 TOPS/W Deep Neural Network Accelerator in 22 nm FDSOI.一款16通道全可配置神经片上系统,采用22纳米全耗尽型绝缘体上硅(FDSOI)工艺,每通道信号采集功耗为1.52微瓦,每通道实时脉冲分类器功耗为2.79微瓦,深度神经网络加速器能效为1.79万亿次运算每秒每瓦。
IEEE Trans Biomed Circuits Syst. 2022 Feb;16(1):94-107. doi: 10.1109/TBCAS.2022.3142987. Epub 2022 May 9.
2
A solution to the learning dilemma for recurrent networks of spiking neurons.用于尖峰神经元递归网络的学习困境的解决方案。
Nat Commun. 2020 Jul 17;11(1):3625. doi: 10.1038/s41467-020-17236-y.
3
GPUs Outperform Current HPC and Neuromorphic Solutions in Terms of Speed and Energy When Simulating a Highly-Connected Cortical Model.
在模拟高度连接的皮质模型时,图形处理器(GPU)在速度和能源方面优于当前的高性能计算(HPC)和神经形态解决方案。
Front Neurosci. 2018 Dec 12;12:941. doi: 10.3389/fnins.2018.00941. eCollection 2018.
4
sPyNNaker: A Software Package for Running PyNN Simulations on SpiNNaker.sPyNNaker:一个用于在SpiNNaker上运行PyNN模拟的软件包。
Front Neurosci. 2018 Nov 20;12:816. doi: 10.3389/fnins.2018.00816. eCollection 2018.
5
A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs).一种具有异构存储结构的可扩展多核架构,用于动态神经形态异步处理器(DYNAPs)。
IEEE Trans Biomed Circuits Syst. 2018 Feb;12(1):106-122. doi: 10.1109/TBCAS.2017.2759700.
6
GeNN: a code generation framework for accelerated brain simulations.GeNN:用于加速大脑模拟的代码生成框架。
Sci Rep. 2016 Jan 7;6:18854. doi: 10.1038/srep18854.
7
Six networks on a universal neuromorphic computing substrate.六个网络在一个通用的神经形态计算基板上。
Front Neurosci. 2013 Feb 18;7:11. doi: 10.3389/fnins.2013.00011. eCollection 2013.
8
Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons.兴奋性和抑制性脉冲发放神经元的稀疏连接网络动力学
J Comput Neurosci. 2000 May-Jun;8(3):183-208. doi: 10.1023/a:1008925309027.