Suppr超能文献

基于信息熵的混合精度量化

Mixed precision quantization based on information entropy.

作者信息

Qin Ting, Li Zhao, Zhao Jiaqi, Yan Yuting, Du Yafei

机构信息

School of Computer Science and Technology, Shandong University of Technology, Zibo, 255049, China.

出版信息

Sci Rep. 2025 Apr 15;15(1):12974. doi: 10.1038/s41598-025-91684-8.

Abstract

Mixed precision quantization represents a sophisticated technique that markedly diminishes a system's computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantization and lead to wasted computational resources and degraded model performance. We propose a bit-width allocation method based on information entropy as a means of mitigating the precision loss caused by quantization. During the forward pass of the model, the entropy value of each layer output is calculated, and a sliding window is employed to smooth these entropy values. By computing a dynamic threshold based on the smoothed average entropy of each layer, we adaptively allocate the bit width for each layer. Furthermore, the threshold and the sliding window size are treated as hyperparameters, which Optuna optimizes. Model accuracy is the constraint, thereby automating the bit-width allocation across layers. Finally, we integrate knowledge distillation, where a larger teacher model guides the training of the quantized model, ensuring high performance despite compression by transferring soft labels and deeper knowledge. Experiments on ResNet20, ResNet32, and ResNet56 architectures show that our method can effectively reduce the bit width of weights and activations to 3.6M/3.6MP while maintaining the accuracy of the model. The maximum accuracy loss of this method on the CIFAR-100 dataset is only 0.6%, and it achieves an accuracy comparable to that of the full-precision model on the CIFAR-10 dataset, fully demonstrating its effectiveness in balancing model compression and performance.

摘要

混合精度量化是一种复杂的技术,通过减少模型的比特宽度显著降低系统的计算和内存需求。然而,在实际应用中,不当的分配策略可能无法利用量化的优势,导致计算资源浪费和模型性能下降。我们提出一种基于信息熵的比特宽度分配方法,以减轻量化导致的精度损失。在模型的前向传播过程中,计算每一层输出的熵值,并使用滑动窗口对这些熵值进行平滑处理。通过基于每一层的平滑平均熵计算动态阈值,我们为每一层自适应地分配比特宽度。此外,阈值和滑动窗口大小被视为超参数,由Optuna进行优化。模型精度作为约束条件,从而实现跨层比特宽度分配的自动化。最后,我们集成了知识蒸馏,即一个更大的教师模型指导量化模型的训练,通过传递软标签和更深层次的知识确保在压缩情况下仍具有高性能。在ResNet20、ResNet32和ResNet56架构上的实验表明,我们的方法可以有效地将权重和激活的比特宽度降低到3.6M/3.6MP,同时保持模型的精度。该方法在CIFAR-100数据集上的最大精度损失仅为0.6%,并且在CIFAR-10数据集上实现了与全精度模型相当的精度,充分证明了其在平衡模型压缩和性能方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a7c/12000403/913031f7f5d5/41598_2025_91684_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验