Sabyasachi Saadi, Misba Walid Al, Shao Yixin, Amiri Pedram Khalili, Atulasimha Jayasimha
Department of Mechanical and Nuclear Engineering, Virginia Commonwealth University, Richmond, VA 23284, United States of America.
Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, United States of America.
Nanotechnology. 2025 Jul 15;36(27). doi: 10.1088/1361-6528/ade243.
An artificial neural network (ANN) inference involves matrix vector multiplications that require a very large number of multiply and accumulate operations, resulting in high energy cost and large device footprint. Stochastic computing (SC) offers a less resource-intensive ANN implementation with minimal accuracy loss. Random number generators (RNG) are required to implement SC in hardware. These can be realized through stochastic-magnetic tunnel junctions (s-MTJ), where the energy barrier to switch between the 'up' and 'down' states is designed to be small, enabling thermal noise to generate a random bit stream. While s-MTJs have previously been used to implement SC-ANNs, these studies have been limited to architectures with continuously varying (i.e.,) weights. In this work, we study the use of SC for matrix vector multiplication withsynaptic weights and quantized outputs. We show that a quantized SC-ANN, implemented by using experimentally obtained s-MTJ bitstreams and a limited number of discrete quantized states for both weights and hidden layer nodes in an ANN, can effectively reduce time (latency) and energy consumption in SC compared to an analog implementation, while largely preserving accuracy. We implemented quantization with 5 and 11 quantized states, along with SC configured with stochastic bitstream lengths of 100, 200, 300, 400, and 500 on neural networks with one hidden layer and three hidden layers. Inference was performed on the MNIST dataset for both training with SC and without SC. Training with SC provided better accuracy for all cases. For the shortest bitstream of 100 bits, the highest accuracies were 92% for one hidden layer and over 96% for three hidden layers. The overall system attained its peak accuracy of 96.82% using a 400 bit stochastic bitstream with three hidden layers. Our investigations demonstrate 9× improvement in latency and 2.6× improvement in energy consumption using the quantized SC approach compared to a similar s-MTJ based ANN architecture without quantization.
人工神经网络(ANN)推理涉及矩阵向量乘法,这需要大量的乘法和累加运算,从而导致高能量成本和较大的设备占用面积。随机计算(SC)提供了一种资源消耗较少的ANN实现方式,且精度损失最小。硬件实现SC需要随机数发生器(RNG)。这可以通过随机磁隧道结(s-MTJ)来实现,其中在“上”和“下”状态之间切换的能垒设计得很小,使得热噪声能够生成随机比特流。虽然s-MTJ此前已被用于实现SC-ANN,但这些研究仅限于权重连续变化的架构。在这项工作中,我们研究了使用SC进行具有突触权重和量化输出的矩阵向量乘法。我们表明,通过使用实验获得的s-MTJ比特流以及ANN中权重和隐藏层节点的有限数量的离散量化状态来实现的量化SC-ANN,与模拟实现相比,可以有效减少SC中的时间(延迟)和能量消耗,同时在很大程度上保持精度。我们在具有一个隐藏层和三个隐藏层的神经网络上实现了具有5个和11个量化状态的量化,以及配置了100、200、300、400和500的随机比特流长度的SC。在MNIST数据集上进行了有SC训练和无SC训练的推理。所有情况下,使用SC训练都能提供更好的精度。对于最短的100比特比特流,一个隐藏层的最高精度为92%,三个隐藏层的最高精度超过96%。使用具有三个隐藏层的400比特随机比特流,整个系统达到了96.82%的峰值精度。我们的研究表明,与类似的未量化的基于s-MTJ的ANN架构相比,使用量化SC方法可使延迟提高9倍,能量消耗提高2.6倍。