Sadeghi Maryam, Rezaeiyan Yasser, Khatiboun Dario Fernandez, Eissa Sherif, Corradi Federico, Augustine Charles, Moradi Farshad
IEEE Trans Biomed Circuits Syst. 2025 Jun;19(3):523-535. doi: 10.1109/TBCAS.2024.3452635.
The realization of brain-scale spiking neural networks (SNNs) is impeded by power constraints and low integration density. To address these challenges, multi-core SNNs are utilized to emulate numerous neurons with high energy efficiency, where spike packets are routed through a network-on-chip (NoC). However, the information can be lost in the NoC under high spike traffic conditions, leading to performance degradation. This work presents NEXUS, a 16-core SNN with a diamond-shaped NoC topology fabricated in 28-nm CMOS technology. It integrates 4096 leaky integrate-and-fire (LIF) neurons with 1M 4-bit synaptic weights, occupying an area of 2.16 mm. The proposed NoC architecture is scalable to any network size, ensuring no data loss due to contending packets with a maximum routing latency of 5.1$\mu$s for 16 cores. The proposed congestion management method eliminates the need for FIFO in routers, resulting in a compact router footprint of 0.001 mm. The proposed neurosynaptic core allows for increasing the processing speed by up to 8.5$\times$ depending on input sparsity. The SNN achieves a peak throughput of 4.7 GSOP/s at 0.9 V, consuming a minimum energy per synaptic operation (SOP) of 3.3 pJ at 0.55 V. A 4-layer feed-forward network is mapped onto the chip, classifying MNIST digits with 92.3% accuracy at 8.4K-classification/s and consuming 2.7-$\mu$J/classification. Additionally, an audio recognition task mapped onto the chip achieves 87.4% accuracy at 215-$\mu$J/classification.
大脑规模的脉冲神经网络(SNN)的实现受到功率限制和低集成密度的阻碍。为应对这些挑战,多核SNN被用于以高能效模拟大量神经元,其中脉冲包通过片上网络(NoC)进行路由。然而,在高脉冲流量条件下,信息可能在NoC中丢失,导致性能下降。这项工作提出了NEXUS,这是一款采用28纳米CMOS技术制造的具有菱形NoC拓扑结构的16核SNN。它集成了4096个泄漏积分发放(LIF)神经元和1M个4位突触权重,占用面积为2.16平方毫米。所提出的NoC架构可扩展到任何网络规模,确保不会因竞争数据包而导致数据丢失,16核的最大路由延迟为5.1微秒。所提出的拥塞管理方法无需在路由器中使用FIFO,从而使路由器占地面积紧凑,仅为0.001平方毫米。所提出的神经突触核心允许根据输入稀疏度将处理速度提高多达8.5倍。该SNN在0.9V时实现了4.7GSOP/s的峰值吞吐量,在0.55V时每个突触操作(SOP)消耗的能量最低为3.3皮焦。一个4层前馈网络被映射到芯片上,以8400次分类/秒的速度对MNIST数字进行分类,准确率为92.3%,每次分类消耗2.7微焦。此外,映射到芯片上的音频识别任务在每次分类215微焦的情况下,准确率达到87.4%。