Institute of Microelectronics, Beijing Innovation Center for Future Chips (ICFC), Tsinghua University, Beijing, China.
Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China.
Nature. 2020 Jan;577(7792):641-646. doi: 10.1038/s41586-020-1942-4. Epub 2020 Jan 29.
Memristor-enabled neuromorphic computing systems provide a fast and energy-efficient approach to training neural networks. However, convolutional neural networks (CNNs)-one of the most important models for image recognition-have not yet been fully hardware-implemented using memristor crossbars, which are cross-point arrays with a memristor device at each intersection. Moreover, achieving software-comparable results is highly challenging owing to the poor yield, large variation and other non-ideal characteristics of devices. Here we report the fabrication of high-yield, high-performance and uniform memristor crossbar arrays for the implementation of CNNs, which integrate eight 2,048-cell memristor arrays to improve parallel-computing efficiency. In addition, we propose an effective hybrid-training method to adapt to device imperfections and improve the overall system performance. We built a five-layer memristor-based CNN to perform MNIST image recognition, and achieved a high accuracy of more than 96 per cent. In addition to parallel convolutions using different kernels with shared inputs, replication of multiple identical kernels in memristor arrays was demonstrated for processing different inputs in parallel. The memristor-based CNN neuromorphic system has an energy efficiency more than two orders of magnitude greater than that of state-of-the-art graphics-processing units, and is shown to be scalable to larger networks, such as residual neural networks. Our results are expected to enable a viable memristor-based non-von Neumann hardware solution for deep neural networks and edge computing.
忆阻器使能的神经形态计算系统为训练神经网络提供了一种快速且节能的方法。然而,卷积神经网络(CNN)——图像识别中最重要的模型之一——尚未完全使用忆阻器交叉阵列(每个交叉点都有一个忆阻器器件的交叉点阵列)进行硬件实现。此外,由于器件的产量低、变化大以及其他不理想的特性,实现软件可比的结果极具挑战性。在这里,我们报告了用于实现 CNN 的高产量、高性能和均匀忆阻器交叉阵列的制造,该阵列集成了八个 2048 单元的忆阻器阵列,以提高并行计算效率。此外,我们提出了一种有效的混合训练方法来适应器件的不完美,提高整体系统性能。我们构建了一个基于五层忆阻器的 CNN 来执行 MNIST 图像识别,并实现了超过 96%的高精度。除了使用具有共享输入的不同核进行并行卷积外,还演示了在忆阻器阵列中复制多个相同的核,以并行处理不同的输入。基于忆阻器的 CNN 神经形态系统的能效比最先进的图形处理单元高出两个数量级以上,并且可以扩展到更大的网络,如残差神经网络。我们的研究结果有望为深度神经网络和边缘计算提供可行的基于忆阻器的非冯·诺依曼硬件解决方案。