Cheng Qi, Hu Xiaofang, Xiao He, Zhou Yue, Duan Shukai
IEEE Trans Biomed Circuits Syst. 2025 Apr;19(2):404-415. doi: 10.1109/TBCAS.2024.3436837. Epub 2025 Apr 2.
In recent years, The combination of Attention mechanism and deep learning has a wide range of applications in the field of medical imaging. However, due to its complex computational processes, existing hardware architectures have high resource consumption or low accuracy, and deploying them efficiently to DNN accelerators is a challenge. This paper proposes an online-programmable Attention hardware architecture based on compute-in-memory (CIM) marco, which reduces the complexity of Attention in hardware and improves integration density, energy efficiency, and calculation accuracy. First, the Attention computation process is decomposed into multiple cascaded combinatorial matrix operations to reduce the complexity of its implementation on the hardware side; second, in order to reduce the influence of the non-ideal characteristics of the hardware, an online-programmable CIM architecture is designed to improve calculation accuracy by dynamically adjusting the weights; and lastly, it is verified that the proposed Attention hardware architecture can be applied for the inference of deep neural networks through Spice simulation. Based on the 100nm CMOS process, compared with the traditional Attention hardware architectures, the integrated density and energy efficiency are increased by at least 91.38 times, and latency and computing efficiency are improved by about 12.5 times.
近年来,注意力机制与深度学习的结合在医学成像领域有着广泛的应用。然而,由于其计算过程复杂,现有的硬件架构资源消耗高或精度低,将它们高效地部署到深度神经网络(DNN)加速器上是一项挑战。本文提出了一种基于存内计算(CIM)架构的在线可编程注意力硬件架构,该架构降低了注意力机制在硬件中的复杂度,提高了集成密度、能源效率和计算精度。首先,将注意力计算过程分解为多个级联的组合矩阵运算,以降低其在硬件实现上的复杂度;其次,为了减少硬件非理想特性的影响,设计了一种在线可编程CIM架构,通过动态调整权重来提高计算精度;最后,通过Spice仿真验证了所提出的注意力硬件架构可应用于深度神经网络的推理。基于100nm CMOS工艺,与传统的注意力硬件架构相比,集成密度和能源效率提高了至少91.38倍,延迟和计算效率提高了约12.5倍。