Paula Eileen, Soni Jayesh, Upadhyay Himanshu, Lagos Leonel
Applied Research Center, Florida International University, Miami, 33174, USA.
Department of Electrical and Computer Engineering, Florida International University, Miami, 33174, USA.
Sci Rep. 2025 Jul 2;15(1):23461. doi: 10.1038/s41598-025-07821-w.
The growing computational demands of models, such as BERT, have raised concerns about their environmental impact. This study addresses the pressing need for sustainable Artificial Intelligence practices by investigating the efficiency of model compression techniques in reducing the energy consumption and carbon emissions of transformer-based models without compromising performance. Specifically, we applied pruning, knowledge distillation, and quantization to transformer-based models (BERT, DistilBERT, ALBERT, and ELECTRA) using the Amazon Polarity Dataset for sentiment analysis. We also compared the energy efficiency of these compressed models against inherently carbon-efficient transformer models, such as TinyBERT and MobileBERT. To evaluate each model's energy consumption and carbon emissions, we utilized the open-source tool CodeCarbon. Our findings indicate that applying model compression techniques resulted in a reduction in energy consumption of 32.097% for BERT with pruning and distillation, [Formula: see text]% for DistilBERT with pruning, 7.12% for ALBERT with quantization, and 23.934% for ELECTRA with pruning and distillation, while maintaining performance metrics within a range of 95.871-99.062% accuracy, precision, recall, F1 score, and ROC AUC except for ALBERT with quantization. Specifically, BERT with pruning and distillation achieved 95.90% accuracy, 95.90% precision, 95.90% recall, 95.90% F1-score, and 98.87% ROC AUC; DistilBERT with pruning achieved 95.87% accuracy, 95.87% precision, 95.87% recall, 95.87% F1-score, and 99.06% ROC AUC; ELECTRA with pruning and distillation achieved 95.92% accuracy, 95.92% precision, 95.92% recall, 95.92% F1-score, and 99.30% ROC AUC; and ALBERT with quantization achieved 65.44% accuracy, 67.82% precision, 65.44% recall, 63.46% F1-score, and 72.31% ROC AUC, indicating significant performance degradation due to quantization sensitivity in its already compressed architecture. Overall, this demonstrates the potential for sustainable Artificial Intelligence practices using model compression.
诸如BERT等模型不断增长的计算需求引发了人们对其环境影响的担忧。本研究通过调查模型压缩技术在不影响性能的情况下降低基于Transformer的模型的能源消耗和碳排放的效率,来满足对可持续人工智能实践的迫切需求。具体而言,我们使用亚马逊极性数据集进行情感分析,对基于Transformer的模型(BERT、DistilBERT、ALBERT和ELECTRA)应用了剪枝、知识蒸馏和量化。我们还将这些压缩模型的能源效率与本质上碳效率高的Transformer模型(如TinyBERT和MobileBERT)进行了比较。为了评估每个模型的能源消耗和碳排放,我们使用了开源工具CodeCarbon。我们的研究结果表明,应用模型压缩技术后,使用剪枝和蒸馏的BERT能源消耗降低了32.097%,使用剪枝的DistilBERT降低了[公式:见原文]%,使用量化的ALBERT降低了7.12%,使用剪枝和蒸馏的ELECTRA降低了23.934%,同时除了使用量化的ALBERT外,性能指标(准确率、精确率、召回率、F1分数和ROC AUC)保持在95.871 - 99.062%的范围内。具体来说,使用剪枝和蒸馏的BERT准确率达到95.90%,精确率达到95.90%,召回率达到95.90%,F1分数达到95.90%,ROC AUC达到98.87%;使用剪枝的DistilBERT准确率达到95.87%,精确率达到95.87%,召回率达到95.87%,F1分数达到95.87%,ROC AUC达到99.06%;使用剪枝和蒸馏的ELECTRA准确率达到95.92%,精确率达到95.92%,召回率达到95.92%,F1分数达到95.92%,ROC AUC达到99.30%;使用量化的ALBERT准确率达到65.44%,精确率达到67.82%,召回率达到65.44%,F1分数达到63.46%,ROC AUC达到72.31%,这表明由于其已经压缩的架构中的量化敏感性,性能显著下降。总体而言,这证明了使用模型压缩实现可持续人工智能实践的潜力。