IEEE Trans Image Process. 2023;32:3912-3923. doi: 10.1109/TIP.2023.3288986. Epub 2023 Jul 17.
Neurologically, filter pruning is a procedure of forgetting and remembering recovering. Prevailing methods directly forget less important information from an unrobust baseline at first and expect to minimize the performance sacrifice. However, unsaturated base remembering imposes a ceiling on the slimmed model leading to suboptimal performance. And significantly forgetting at first would cause unrecoverable information loss. Here, we design a novel filter pruning paradigm termed Remembering Enhancement and Entropy-based Asymptotic Forgetting (REAF). Inspired by robustness theory, we first enhance remembering by over-parameterizing baseline with fusible compensatory convolutions which liberates pruned model from the bondage of baseline at no inference cost. Then the collateral implication between original and compensatory filters necessitates a bilateral-collaborated pruning criterion. Specifically, only when the filter has the largest intra-branch distance and its compensatory counterpart has the strongest remembering enhancement power, they are preserved. Further, Ebbinghaus curve-based asymptotic forgetting is proposed to protect the pruned model from unstable learning. The number of pruned filters is increasing asymptotically in the training procedure, which enables the remembering of pretrained weights gradually to be concentrated in the remaining filters. Extensive experiments demonstrate the superiority of REAF over many state-of-the-art (SOTA) methods. For example, REAF removes 47.55% FLOPs and 42.98% parameters of ResNet-50 only with 0.98% TOP-1 accuracy loss on ImageNet. The code is available at https://github.com/zhangxin-xd/REAF.
从神经学角度来看,滤波器修剪是一种遗忘和记忆恢复的过程。目前的主流方法首先直接从不稳定的基线中删除不太重要的信息,并期望最小化性能损失。然而,不饱和的基础记忆会给瘦身模型设定一个上限,导致性能不理想。而且,如果一开始就大量遗忘,就会导致不可恢复的信息丢失。在这里,我们设计了一种新的滤波器修剪范式,称为记忆增强和基于熵的渐近遗忘(REAF)。受稳健性理论的启发,我们首先通过可融合的补偿卷积对基线进行过度参数化,从而使修剪后的模型在不增加推理成本的情况下摆脱基线的束缚。然后,原始滤波器和补偿滤波器之间的关联需要一个双边协作的修剪标准。具体来说,只有当滤波器具有最大的分支内距离,并且其补偿滤波器具有最强的记忆增强能力时,它们才会被保留。此外,还提出了基于艾宾浩斯曲线的渐近遗忘来保护修剪后的模型免受不稳定学习的影响。在训练过程中,修剪后的滤波器数量呈渐近增加,这使得预训练权重的记忆逐渐集中在剩余的滤波器中。大量实验表明,REAF 优于许多最先进的(SOTA)方法。例如,REAF 在 ImageNet 上仅将 ResNet-50 的 FLOPs 和参数减少了 47.55%和 42.98%,而 TOP-1 准确率仅损失了 0.98%。代码可在 https://github.com/zhangxin-xd/REAF 上获得。