• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

实现大语言模型在忆阻器交叉开关上的节能部署:大小协同。

Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small.

作者信息

Wang Zhehui, Luo Tao, Liu Cheng, Liu Weichen, Goh Rick Siow Mong, Wong Weng-Fai

出版信息

IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):916-933. doi: 10.1109/TPAMI.2024.3483654. Epub 2025 Jan 9.

DOI:10.1109/TPAMI.2024.3483654
PMID:39423084
Abstract

Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. First, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Second, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERT showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to in area overhead and in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a reduction in the area-delay product and a significant 69% energy consumption reduction.

摘要

大语言模型(LLMs)因其在不同领域的应用前景而备受关注。然而,LLMs规模的不断扩大,使得训练和部署的计算需求大幅增加。忆阻器交叉开关已成为一种有前景的解决方案,在计算机视觉(CV)模型中展现出小尺寸和极高的能源效率。与传统存储技术相比,忆阻器具有更高的密度,使其非常适合有效管理与LLMs相关的极大模型规模。然而,在忆阻器交叉开关上部署LLMs面临三大挑战。首先,LLMs的规模迅速增长,已超过了最先进忆阻器芯片的能力。其次,LLMs通常包含多头注意力模块,其中涉及传统忆阻器交叉开关无法支持的非权重固定乘法。第三,虽然忆阻器交叉开关擅长执行线性运算,但它们无法在LLMs中执行诸如softmax和层归一化等复杂的非线性运算。为应对这些挑战,我们提出了一种用于忆阻器交叉开关的新颖架构,该架构能够在单个芯片或封装上部署最先进的LLM,消除了与片外通信相关的能量和时间效率低下问题。我们在BERT上的测试显示精度损失可忽略不计。与传统忆阻器交叉开关相比,我们的架构在面积开销和能耗方面最多可实现提升。与现代TPU/GPU系统相比,我们的架构至少在面积延迟积方面有所降低,并且能耗显著降低了69%。

相似文献

1
Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small.实现大语言模型在忆阻器交叉开关上的节能部署:大小协同。
IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):916-933. doi: 10.1109/TPAMI.2024.3483654. Epub 2025 Jan 9.
2
Memristor crossbar arrays with 6-nm half-pitch and 2-nm critical dimension.半间距为6纳米且关键尺寸为2纳米的忆阻器交叉阵列。
Nat Nanotechnol. 2019 Jan;14(1):35-39. doi: 10.1038/s41565-018-0302-0. Epub 2018 Nov 12.
3
Efficient combinatorial optimization by quantum-inspired parallel annealing in analogue memristor crossbar.基于模拟忆阻器交叉阵列中量子启发式并行退火的高效组合优化
Nat Commun. 2023 Sep 22;14(1):5927. doi: 10.1038/s41467-023-41647-2.
4
Research on the Impact of Data Density on Memristor Crossbar Architectures in Neuromorphic Pattern Recognition.数据密度对神经形态模式识别中忆阻器交叉阵列架构的影响研究。
Micromachines (Basel). 2023 Oct 27;14(11):1990. doi: 10.3390/mi14111990.
5
Area-Efficient Mapping of Convolutional Neural Networks to Memristor Crossbars Using Sub-Image Partitioning.使用子图像分区将卷积神经网络高效映射到忆阻器交叉阵列
Micromachines (Basel). 2023 Jan 25;14(2):309. doi: 10.3390/mi14020309.
6
Linear conductance update improvement of CMOS-compatible second-order memristors for fast and energy-efficient training of a neural network using a memristor crossbar array.用于使用忆阻器交叉阵列对神经网络进行快速且节能训练的CMOS兼容二阶忆阻器的线性电导更新改进
Nanoscale Horiz. 2023 Sep 26;8(10):1366-1376. doi: 10.1039/d3nh00121k.
7
Asymmetrical Training Scheme of Binary-Memristor-Crossbar-Based Neural Networks for Energy-Efficient Edge-Computing Nanoscale Systems.用于节能边缘计算纳米级系统的基于二元忆阻器交叉开关神经网络的非对称训练方案
Micromachines (Basel). 2019 Feb 20;10(2):141. doi: 10.3390/mi10020141.
8
Memristor-CMOS Hybrid Neuron Circuit with Nonideal-Effect Correction Related to Parasitic Resistance for Binary-Memristor-Crossbar Neural Networks.用于二元忆阻器交叉开关神经网络的、具有与寄生电阻相关的非理想效应校正的忆阻器-互补金属氧化物半导体混合神经元电路。
Micromachines (Basel). 2021 Jul 1;12(7):791. doi: 10.3390/mi12070791.
9
Synapse-Neuron-Aware Training Scheme of Defect-Tolerant Neural Networks with Defective Memristor Crossbars.具有缺陷忆阻器交叉阵列的容错神经网络的突触-神经元感知训练方案
Micromachines (Basel). 2022 Feb 8;13(2):273. doi: 10.3390/mi13020273.
10
Nano-Crossbar Weighted Memristor-Based Convolution Neural Network Architecture for High-Performance Artificial Intelligence Applications.基于纳米交叉点加权忆阻器的卷积神经网络架构,用于高性能人工智能应用。
J Nanosci Nanotechnol. 2021 Mar 1;21(3):1833-1844. doi: 10.1166/jnn.2021.18910.

引用本文的文献

1
Large Language Models in Medicine: Applications, Challenges, and Future Directions.医学领域的大语言模型:应用、挑战与未来方向。
Int J Med Sci. 2025 May 31;22(11):2792-2801. doi: 10.7150/ijms.111780. eCollection 2025.