用于超低功耗非冯·诺依曼计算的多输入内存逻辑

Multi-Input Logic-in-Memory for Ultra-Low Power Non-Von Neumann Computing.

作者信息

Zanotti Tommaso, Pavan Paolo, Puglisi Francesco Maria

机构信息

Department of Engineering "Enzo Ferrari", University of Modena and Reggio Emilia, Via P. Vivarelli 10/1, 41125 Modena, Italy.

出版信息

Micromachines (Basel). 2021 Oct 14;12(10):1243. doi: 10.3390/mi12101243.

DOI:10.3390/mi12101243

PMID:34683294

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8538894/

Abstract

Logic-in-memory (LIM) circuits based on the material implication logic (IMPLY) and resistive random access memory (RRAM) technologies are a candidate solution for the development of ultra-low power non-von Neumann computing architectures. Such architectures could enable the energy-efficient implementation of hardware accelerators for novel edge computing paradigms such as binarized neural networks (BNNs) which rely on the execution of logic operations. In this work, we present the multi-input IMPLY operation implemented on a recently developed smart IMPLY architecture, SIMPLY, which improves the circuit reliability, reduces energy consumption, and breaks the strict design trade-offs of conventional architectures. We show that the generalization of the typical logic schemes used in LIM circuits to multi-input operations strongly reduces the execution time of complex functions needed for BNNs inference tasks (e.g., the 1-bit Full Addition, XNOR, Popcount). The performance of four different RRAM technologies is compared using circuit simulations leveraging a physics-based RRAM compact model. The proposed solution approaches the performance of its CMOS equivalent while bypassing the von Neumann bottleneck, which gives a huge improvement in bit error rate (by a factor of at least 10) and energy-delay product (projected up to a factor of 10).

摘要

基于实质蕴涵逻辑（IMPLY）和电阻式随机存取存储器（RRAM）技术的内存逻辑（LIM）电路是开发超低功耗非冯·诺依曼计算架构的一种候选解决方案。这样的架构能够以节能方式实现硬件加速器，用于诸如依赖逻辑运算执行的二值化神经网络（BNN）等新型边缘计算范式。在这项工作中，我们展示了在最近开发的智能IMPLY架构SIMPLY上实现的多输入IMPLY运算，该架构提高了电路可靠性，降低了能耗，并打破了传统架构严格的设计权衡。我们表明，将LIM电路中使用的典型逻辑方案推广到多输入运算，可大幅减少BNN推理任务所需复杂函数的执行时间（例如，1位全加、异或非、汉明重量计算）。利用基于物理的RRAM紧凑模型，通过电路仿真比较了四种不同RRAM技术的性能。所提出的解决方案在绕过冯·诺依曼瓶颈的同时，接近其CMOS等效方案的性能，这在误码率（至少提高10倍）和能量延迟积（预计高达10倍）方面有巨大改进。