Nandakumar S R, Le Gallo Manuel, Piveteau Christophe, Joshi Vinay, Mariani Giovanni, Boybat Irem, Karunaratne Geethan, Khaddam-Aljameh Riduan, Egger Urs, Petropoulos Anastasios, Antonakopoulos Theodore, Rajendran Bipin, Sebastian Abu, Eleftheriou Evangelos
IBM Research - Zurich, Rüschlikon, Switzerland.
Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland.
Front Neurosci. 2020 May 12;14:406. doi: 10.3389/fnins.2020.00406. eCollection 2020.
Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and this has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory devices organized in crossbar arrays could store the synaptic weights in their conductance states and perform the expensive weighted summations in place in a non-von Neumann manner. However, updating the conductance states in a reliable manner during the weight update process is a fundamental challenge that limits the training accuracy of such an implementation. Here, we propose a mixed-precision architecture that combines a computational memory unit performing the weighted summations and imprecise conductance updates with a digital processing unit that accumulates the weight updates in high precision. A combined hardware/software training experiment of a multilayer perceptron based on the proposed architecture using a phase-change memory (PCM) array achieves 97.73% test accuracy on the task of classifying handwritten digits (based on the MNIST dataset), within 0.6% of the software baseline. The architecture is further evaluated using accurate behavioral models of PCM on a wide class of networks, namely convolutional neural networks, long-short-term-memory networks, and generative-adversarial networks. Accuracies comparable to those of floating-point implementations are achieved without being constrained by the non-idealities associated with the PCM devices. A system-level study demonstrates 172 × improvement in energy efficiency of the architecture when used for training a multilayer perceptron compared with a dedicated fully digital 32-bit implementation.
深度神经网络(DNN)彻底改变了人工智能领域,并在图像和语音识别等认知任务中取得了前所未有的成功。然而,大型DNN的训练计算量很大,这促使人们寻找针对该应用的新型计算架构。一种由交叉阵列中纳米级电阻式存储器件组成的计算存储单元,可以将突触权重存储在其电导状态中,并以非冯·诺依曼方式就地执行昂贵的加权求和。然而,在权重更新过程中以可靠的方式更新电导状态是一个基本挑战,限制了这种实现方式的训练精度。在此,我们提出一种混合精度架构,该架构将执行加权求和和不精确电导更新的计算存储单元与以高精度累积权重更新的数字处理单元相结合。基于所提出架构、使用相变存储器(PCM)阵列的多层感知器的硬件/软件联合训练实验,在手写数字分类任务(基于MNIST数据集)上实现了97.73%的测试准确率,与软件基线的差距在0.6%以内。该架构还使用PCM的精确行为模型在广泛的网络类型上进行了进一步评估,即卷积神经网络、长短时记忆网络和生成对抗网络。在不受PCM器件相关非理想因素限制的情况下,实现了与浮点实现相当的准确率。一项系统级研究表明,与专用的全数字32位实现相比,该架构用于训练多层感知器时能量效率提高了172倍。