Liu Shuang, Wang J J, Zhou J T, Hu S G, Yu Q, Chen T P, Liu Y
IEEE Trans Biomed Circuits Syst. 2023 Feb;17(1):92-104. doi: 10.1109/TBCAS.2023.3242413.
In this article, we present a spiking neural network (SNN) based on both SRAM processing-in-memory (PIM) macro and on-chip unsupervised learning with Spike-Time-Dependent Plasticity (STDP). Co-design of algorithm and hardware for hardware-friendly SNN and efficient STDP-based learning methodology is used to improve area and energy efficiency. The proposed macro utilizes charge sharing of capacitors to perform fully parallel Reconfigurable Multi-bit PIM Multiply-Accumulate (RMPMA) operations. A thermometer-coded Programmable High-precision PIM Threshold Generator (PHPTG) is designed to achieve low differential non-linearity (DNL) and high linearity. In the macro, each column of PIM cells and a comparator act as a neuron to accumulate membrane potential and fire spikes. A simplified Winner Takes All (WTA) mechanism is used in the proposed hardware-friendly architecture. By combining the hardware-friendly STDP algorithm as well as the parallel Word Lines (WLs) and Processing Bit Lines (PBLs), we realize unsupervised learning and recognize the Modified National Institute of Standards and Technology (MNIST) dataset. The chip for the hardware implementation was fabricated with a 55 nm CMOS process. The measurement shows that the chip achieves a learning efficiency of 0.47 nJ/pixel, with a learning energy efficiency of 70.38 TOPS/W. This work paves a pathway for the on-chip learning algorithm in PIM with lower power consumption and fewer hardware resources.
在本文中,我们提出了一种基于静态随机存取存储器(SRAM)内存处理(PIM)宏以及基于脉冲时间依赖可塑性(STDP)的片上无监督学习的脉冲神经网络(SNN)。通过硬件友好型SNN的算法与硬件协同设计以及基于STDP的高效学习方法,来提高面积和能源效率。所提出的宏利用电容器的电荷共享来执行完全并行的可重构多位PIM乘法累加(RMPMA)操作。设计了一种温度计编码的可编程高精度PIM阈值发生器(PHPTG),以实现低差分非线性(DNL)和高线性度。在该宏中,PIM单元的每一列和一个比较器充当一个神经元,用于累积膜电位并激发脉冲。在所提出的硬件友好型架构中使用了简化的胜者全得(WTA)机制。通过结合硬件友好型STDP算法以及并行字线(WL)和处理位线(PBL),我们实现了无监督学习并识别了修改后的美国国家标准与技术研究院(MNIST)数据集。用于硬件实现的芯片采用55纳米互补金属氧化物半导体(CMOS)工艺制造。测量结果表明,该芯片实现了0.47纳焦/像素的学习效率,学习能效为70.38万亿次操作每秒每瓦。这项工作为低功耗和更少硬件资源的PIM片上学习算法铺平了道路。