• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于可持续大语言模型的具有智能剪枝的高效自注意力机制

Efficient self-attention with smart pruning for sustainable large language models.

作者信息

Belhaouari Samir Brahim, Kraidia Insaf

机构信息

Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Ar-Rayyan, Qatar.

Faculty of Information Technology, Department of Networks and Cybersecurity, Al-Ahliyya Amman University, Amman, Jordan.

出版信息

Sci Rep. 2025 Mar 24;15(1):10171. doi: 10.1038/s41598-025-92586-5.

DOI:10.1038/s41598-025-92586-5
PMID:40128247
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11933332/
Abstract

Large Language Models (LLMs) have revolutionized artificial intelligence by enabling multitasking across diverse fields. However, their high computational demands result in significant environmental impacts, particularly in terms of energy and water consumption. This paper addresses these issues by proposing an innovative compression approach to reducing LLM sizes. We focus on compressing the internal transformer layers, which are critical contributors to LLMs' computational complexity. Our approach combines new mathematical and structural key methods for model compression. We begin by applying Forward Propagation Pruning (FPP) to compress the embedding and feed-forward layers, utilizing a weight freezing and zeroing technique for suspected unused parameters. This reduces the number of trainable parameters, accelerating the overall training process and enabling faster convergence. Second, the Weight Matrix Folding method is introduced to efficiently prune the self-attention layer matrices in a simple and efficient mathematical model. This method integrates Identical Row Compression (IRC) to optimize the compression of the Query and Key matrices, alongside Diagonal Weight Compression (DWC), which reformulates the Value matrix into a diagonal structure. Consequently, this technique significantly diminishes parameter variability across the three metrics, enhancing consistency and performance while simplifying complexity. The compression approach is evaluated on three language modeling datasets and eight widely used classification datasets, comparing it to various pruning methods. Our method successfully compresses transformer layers by 99% and linear layers by 70%, resulting in an overall model compression of around 70%, while maintaining nearly the same accuracy. Notably, with moderate compression rates of 20% to 40%, model performance not only remained stable but even improved. This leads to substantial reductions in memory usage and computational demands, making LLMs more resource-efficient and highlighting the potential to optimize them for a more sustainable AI future.

摘要

大语言模型(LLMs)通过实现跨多个领域的多任务处理,彻底改变了人工智能。然而,它们对计算的高要求导致了重大的环境影响,特别是在能源和水资源消耗方面。本文通过提出一种创新的压缩方法来减小大语言模型的大小,从而解决这些问题。我们专注于压缩内部变压器层,这些层是大语言模型计算复杂性的关键因素。我们的方法结合了用于模型压缩的新数学和结构关键方法。我们首先应用前向传播剪枝(FPP)来压缩嵌入层和前馈层,利用权重冻结和归零技术处理疑似未使用的参数。这减少了可训练参数的数量,加速了整体训练过程并实现了更快的收敛。其次,引入权重矩阵折叠方法,以在一个简单高效的数学模型中有效地剪枝自注意力层矩阵。该方法集成了相同行压缩(IRC)来优化查询和键矩阵的压缩,同时采用对角权重压缩(DWC),将值矩阵重新构造成对角结构。因此,该技术显著降低了三个指标上的参数变异性,提高了一致性和性能,同时简化了复杂性。我们在三个语言建模数据集和八个广泛使用的分类数据集上评估了这种压缩方法,并将其与各种剪枝方法进行比较。我们的方法成功地将变压器层压缩了99%,线性层压缩了70%,从而使整体模型压缩率达到约70%,同时保持了几乎相同的准确率。值得注意的是,在20%至40%的适度压缩率下,模型性能不仅保持稳定,甚至有所提高。这导致内存使用和计算需求大幅减少,使大语言模型更具资源效率,并突出了为更可持续的人工智能未来对其进行优化的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/f394dfe04695/41598_2025_92586_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/bee428785ef0/41598_2025_92586_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/8e3deaf894e5/41598_2025_92586_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/bd387a69afd4/41598_2025_92586_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/c68020e9b11f/41598_2025_92586_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/bf886f102805/41598_2025_92586_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/1fb023fe54ab/41598_2025_92586_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/1208ec7f1e1d/41598_2025_92586_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/a9aaf0914e93/41598_2025_92586_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/cdfdf5d12ad0/41598_2025_92586_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/223f20194764/41598_2025_92586_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/1654ab492413/41598_2025_92586_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/8a658baa9fce/41598_2025_92586_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/5afbaa6c1494/41598_2025_92586_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/31dd10fb23c3/41598_2025_92586_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/e858e003ba15/41598_2025_92586_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/3e4f68c59970/41598_2025_92586_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/30ce001e1473/41598_2025_92586_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/f394dfe04695/41598_2025_92586_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/bee428785ef0/41598_2025_92586_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/8e3deaf894e5/41598_2025_92586_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/bd387a69afd4/41598_2025_92586_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/c68020e9b11f/41598_2025_92586_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/bf886f102805/41598_2025_92586_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/1fb023fe54ab/41598_2025_92586_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/1208ec7f1e1d/41598_2025_92586_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/a9aaf0914e93/41598_2025_92586_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/cdfdf5d12ad0/41598_2025_92586_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/223f20194764/41598_2025_92586_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/1654ab492413/41598_2025_92586_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/8a658baa9fce/41598_2025_92586_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/5afbaa6c1494/41598_2025_92586_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/31dd10fb23c3/41598_2025_92586_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/e858e003ba15/41598_2025_92586_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/3e4f68c59970/41598_2025_92586_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/30ce001e1473/41598_2025_92586_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3e/11933332/f394dfe04695/41598_2025_92586_Fig16_HTML.jpg

相似文献

1
Efficient self-attention with smart pruning for sustainable large language models.用于可持续大语言模型的具有智能剪枝的高效自注意力机制
Sci Rep. 2025 Mar 24;15(1):10171. doi: 10.1038/s41598-025-92586-5.
2
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型:平台开发与综合分析
J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.
3
Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals.大语言模型与用户信任:自我参照学习循环的后果及医疗保健专业人员的技能退化
J Med Internet Res. 2024 Apr 25;26:e56764. doi: 10.2196/56764.
4
Developing healthcare language model embedding spaces.开发医疗保健语言模型嵌入空间。
Artif Intell Med. 2024 Dec;158:103009. doi: 10.1016/j.artmed.2024.103009. Epub 2024 Oct 31.
5
Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study.将医学知识图谱融入大语言模型进行诊断预测:设计与应用研究
JMIR AI. 2025 Feb 24;4:e58670. doi: 10.2196/58670.
6
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
7
Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用:系统评价。
J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.
8
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
9
Weak sub-network pruning for strong and efficient neural networks.弱子网络剪枝技术:构建强大而高效的神经网络
Neural Netw. 2021 Dec;144:614-626. doi: 10.1016/j.neunet.2021.09.015. Epub 2021 Sep 30.
10
Reconciling the contrasting narratives on the environmental impact of large language models.调和关于大型语言模型环境影响的相互矛盾的说法。
Sci Rep. 2024 Nov 1;14(1):26310. doi: 10.1038/s41598-024-76682-6.

引用本文的文献

1
Cognitive difference text classification in online knowledge collaboration based on SA-BiLSTM hybrid model.基于SA-BiLSTM混合模型的在线知识协作中的认知差异文本分类
Sci Rep. 2025 Jul 1;15(1):22171. doi: 10.1038/s41598-025-06914-w.

本文引用的文献

1
Defense against adversarial attacks: robust and efficient compressed optimized neural networks.对抗攻击防御:健壮且高效的压缩优化神经网络。
Sci Rep. 2024 Mar 17;14(1):6420. doi: 10.1038/s41598-024-56259-z.
2
Autonomous chemical research with large language models.大语言模型驱动的自主化学研究。
Nature. 2023 Dec;624(7992):570-578. doi: 10.1038/s41586-023-06792-0. Epub 2023 Dec 20.