Bettayeb Meriem, Halawani Yasmin, Khan Muhammad Umair, Saleh Hani, Mohammad Baker
System-on-Chip Lab, Computer and Information Engineering, Khalifa University, Abu Dhabi, UAE.
Computer Science and Information Technology Department, College of Engineering, Abu Dhabi University, Abu Dhabi, UAE.
Sci Rep. 2024 Oct 15;14(1):24173. doi: 10.1038/s41598-024-75021-z.
The adoption of transformer networks has experienced a notable surge in various AI applications. However, the increased computational complexity, stemming primarily from the self-attention mechanism, parallels the manner in which convolution operations constrain the capabilities and speed of convolutional neural networks (CNNs). The self-attention algorithm, specifically the matrix-matrix multiplication (MatMul) operations, demands a substantial amount of memory and computational complexity, thereby restricting the overall performance of the transformer. This paper introduces an efficient hardware accelerator for the transformer network, leveraging memristor-based in-memory computing. The design targets the memory bottleneck associated with MatMul operations in the self-attention process, utilizing approximate analog computation and the highly parallel computations facilitated by the memristor crossbar architecture. Remarkably, this approach resulted in a reduction of approximately 10 times in the number of multiply-accumulate (MAC) operations in transformer networks, while maintaining 95.47% accuracy for the MNIST dataset, as validated by a comprehensive circuit simulator employing NeuroSim 3.0. Simulation outcomes indicate an area utilization of 6895.7 , a latency of 15.52 seconds, an energy consumption of 3 mJ, and a leakage power of 59.55 . The methodology outlined in this paper represents a substantial stride towards a hardware-friendly transformer architecture for edge devices, poised to achieve real-time performance.
变压器网络在各种人工智能应用中的采用率显著飙升。然而,主要源于自注意力机制的计算复杂度增加,这与卷积运算限制卷积神经网络(CNN)的能力和速度的方式类似。自注意力算法,特别是矩阵 - 矩阵乘法(MatMul)运算,需要大量内存和计算复杂度,从而限制了变压器的整体性能。本文介绍了一种用于变压器网络的高效硬件加速器,利用基于忆阻器的内存计算。该设计针对自注意力过程中与MatMul运算相关的内存瓶颈,利用近似模拟计算和忆阻器交叉开关架构所促进的高度并行计算。值得注意的是,通过采用NeuroSim 3.0的综合电路模拟器验证,这种方法使变压器网络中的乘法累加(MAC)运算次数减少了约10倍,同时对于MNIST数据集保持了95.47%的准确率。仿真结果表明,其面积利用率为6895.7 ,延迟为15.52秒,能耗为3 mJ,漏功率为59.55 。本文概述的方法朝着为边缘设备设计对硬件友好的变压器架构迈出了重要一步,有望实现实时性能。