Geng Xue, Wang Zhe, Chen Chunyun, Xu Qing, Xu Kaixin, Jin Chao, Gupta Manas, Yang Xulei, Chen Zhenghua, Sabry Aly Mohamed M, Lin Jie, Wu Min, Li Xiaoli
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):5837-5857. doi: 10.1109/TNNLS.2024.3394494. Epub 2025 Apr 4.
Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research on compression methods to achieve model efficiency while retaining performance. Furthermore, more and more works focus on customizing the DNN hardware accelerators to better leverage the model compression techniques. In addition to efficiency, preserving security and privacy is critical for deploying DNNs. However, the vast and diverse body of related works can be overwhelming. This inspires us to conduct a comprehensive survey on recent research toward the goal of high-performance, cost-efficient, and safe deployment of DNNs. Our survey first covers the mainstream model compression techniques, such as model quantization, model pruning, knowledge distillation, and optimizations of nonlinear operations. We then introduce recent advances in designing hardware accelerators that can adapt to efficient model compression approaches. In addition, we discuss how homomorphic encryption can be integrated to secure DNN deployment. Finally, we discuss several issues, such as hardware evaluation, generalization, and integration of various compression approaches. Overall, we aim to provide a big picture of efficient DNNs from algorithm to hardware accelerators and security perspectives.
深度神经网络(DNN)已广泛应用于许多人工智能(AI)任务中。然而,由于内存、能源和计算成本巨大,部署这些网络带来了重大挑战。为应对这些挑战,研究人员开发了各种模型压缩技术,如模型量化和模型剪枝。最近,在实现模型效率同时保留性能的压缩方法研究方面激增。此外,越来越多的工作专注于定制DNN硬件加速器,以更好地利用模型压缩技术。除了效率,保护安全和隐私对于部署DNN至关重要。然而,大量且多样的相关工作可能让人应接不暇。这促使我们针对DNN的高性能、成本高效且安全的部署目标,对近期研究进行全面调查。我们的调查首先涵盖主流模型压缩技术,如模型量化、模型剪枝、知识蒸馏以及非线性操作的优化。然后,我们介绍在设计能够适应高效模型压缩方法的硬件加速器方面的最新进展。此外,我们讨论如何集成同态加密以保障DNN部署安全。最后,我们讨论几个问题,如硬件评估、泛化以及各种压缩方法的集成。总体而言,我们旨在从算法到硬件加速器以及安全角度,提供高效DNN的全景图。