IEEE Trans Pattern Anal Mach Intell. 2022 Aug;44(8):4035-4051. doi: 10.1109/TPAMI.2021.3066410. Epub 2022 Jul 1.
We study network pruning which aims to remove redundant channels/kernels and hence speed up the inference of deep networks. Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, while the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power. To this end, we first introduce additional discrimination-aware losses into the network to increase the discriminative power of the intermediate layers. Next, we select the most discriminative channels for each layer by considering the discrimination-aware loss and the reconstruction error, simultaneously. We then formulate channel pruning as a sparsity-inducing optimization problem with a convex objective and propose a greedy algorithm to solve the resultant problem. Note that a channel (3D tensor) often consists of a set of kernels (each with a 2D matrix). Besides the redundancy in channels, some kernels in a channel may also be redundant and fail to contribute to the discriminative power of the network, resulting in kernel level redundancy. To solve this issue, we propose a discrimination-aware kernel pruning (DKP) method to further compress deep networks by removing redundant kernels. To avoid manually determining the pruning rate for each layer, we propose two adaptive stopping conditions to automatically determine the number of selected channels/kernels. The proposed adaptive stopping conditions tend to yield more efficient models with better performance in practice. Extensive experiments on both image classification and face recognition demonstrate the effectiveness of our methods. For example, on ILSVRC-12, the resultant ResNet-50 model with 30 percent reduction of channels even outperforms the baseline model by 0.36 percent in terms of Top-1 accuracy. We also deploy the pruned models on a smartphone (equipped with a Qualcomm Snapdragon 845 processor). The pruned MobileNetV1 and MobileNetV2 achieve 1.93× and 1.42× inference acceleration on the mobile device, respectively, with negligible performance degradation. The source code and the pre-trained models are available at https://github.com/SCUT-AILab/DCP.
我们研究网络剪枝,旨在去除冗余的通道/内核,从而加快深度网络的推理速度。现有的剪枝方法要么在稀疏性约束下从头开始训练,要么最小化预训练模型和压缩模型之间的特征图之间的重建误差。这两种策略都存在一些局限性:前者计算成本高,难以收敛,而后者则优化了重建误差,但忽略了通道的判别能力。在本文中,我们提出了一种简单而有效的方法,称为判别感知通道剪枝(DCP),以选择对判别能力有贡献的通道。为此,我们首先在网络中引入额外的判别感知损失,以增加中间层的判别能力。接下来,我们通过同时考虑判别感知损失和重建误差,为每个层选择最具判别力的通道。然后,我们将通道剪枝表示为一个具有凸目标的稀疏诱导优化问题,并提出了一种贪婪算法来求解该问题。请注意,一个通道(三维张量)通常由一组内核(每个内核都有一个二维矩阵)组成。除了通道中的冗余之外,通道中的一些内核也可能是冗余的,并且无法为网络的判别能力做出贡献,从而导致内核级冗余。为了解决这个问题,我们提出了一种判别感知内核剪枝(DKP)方法,通过去除冗余内核进一步压缩深度网络。为了避免为每个层手动确定剪枝率,我们提出了两种自适应停止条件来自动确定选择的通道/内核数量。在实践中,所提出的自适应停止条件往往会产生更有效的模型,并且性能更好。在图像分类和人脸识别方面的大量实验表明了我们方法的有效性。例如,在 ILSVRC-12 上,通道减少 30%的结果 ResNet-50 模型在 Top-1 准确率方面甚至比基线模型高出 0.36%。我们还在智能手机(配备高通骁龙 845 处理器)上部署了剪枝模型。剪枝后的 MobileNetV1 和 MobileNetV2 在移动设备上的推理速度分别提高了 1.93 倍和 1.42 倍,性能几乎没有下降。源代码和预训练模型可在 https://github.com/SCUT-AILab/DCP 上获得。