Ruan Xiaofeng, Liu Yufan, Yuan Chunfeng, Li Bing, Hu Weiming, Li Yangxi, Maybank Stephen
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4499-4513. doi: 10.1109/TNNLS.2020.3018177. Epub 2021 Oct 5.
Model compression methods have become popular in recent years, which aim to alleviate the heavy load of deep neural networks (DNNs) in real-world applications. However, most of the existing compression methods have two limitations: 1) they usually adopt a cumbersome process, including pretraining, training with a sparsity constraint, pruning/decomposition, and fine-tuning. Moreover, the last three stages are usually iterated multiple times. 2) The models are pretrained under explicit sparsity or low-rank assumptions, which are difficult to guarantee wide appropriateness. In this article, we propose an efficient decomposition and pruning (EDP) scheme via constructing a compressed-aware block that can automatically minimize the rank of the weight matrix and identify the redundant channels. Specifically, we embed the compressed-aware block by decomposing one network layer into two layers: a new weight matrix layer and a coefficient matrix layer. By imposing regularizers on the coefficient matrix, the new weight matrix learns to become a low-rank basis weight, and its corresponding channels become sparse. In this way, the proposed compressed-aware block simultaneously achieves low-rank decomposition and channel pruning by only one single data-driven training stage. Moreover, the network of architecture is further compressed and optimized by a novel Pruning & Merging (PM) module which prunes redundant channels and merges redundant decomposed layers. Experimental results (17 competitors) on different data sets and networks demonstrate that the proposed EDP achieves a high compression ratio with acceptable accuracy degradation and outperforms state-of-the-arts on compression rate, accuracy, inference time, and run-time memory.
近年来,模型压缩方法变得很流行,其旨在减轻深度神经网络(DNN)在实际应用中的沉重负担。然而,现有的大多数压缩方法有两个局限性:1)它们通常采用繁琐的过程,包括预训练、带稀疏性约束的训练、剪枝/分解和微调。此外,后三个阶段通常要迭代多次。2)模型是在显式稀疏性或低秩假设下进行预训练的,这难以保证广泛的适用性。在本文中,我们通过构建一个能自动最小化权重矩阵秩并识别冗余通道的压缩感知模块,提出了一种高效分解与剪枝(EDP)方案。具体来说,我们通过将一个网络层分解为两层来嵌入压缩感知模块:一个新的权重矩阵层和一个系数矩阵层。通过对系数矩阵施加正则化器,新的权重矩阵学习成为低秩基权重,并且其相应的通道变得稀疏。这样,所提出的压缩感知模块仅通过一个单一的数据驱动训练阶段就同时实现了低秩分解和通道剪枝。此外,通过一个新颖的剪枝与合并(PM)模块对网络架构进一步进行压缩和优化,该模块会剪枝冗余通道并合并冗余分解层。在不同数据集和网络上的实验结果(与17个竞争对手比较)表明,所提出的EDP在可接受的精度下降情况下实现了高压缩率,并且在压缩率、精度、推理时间和运行时内存方面优于现有技术。