IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4430-4446. doi: 10.1109/TPAMI.2022.3194044. Epub 2023 Mar 7.
Dynamic networks have shown their promising capability in reducing theoretical computation complexity by adapting their architectures to the input during inference. However, their practical runtime usually lags behind the theoretical acceleration due to inefficient sparsity. In this paper, we explore a hardware-efficient dynamic inference regime, named dynamic weight slicing, that can generalized well on multiple dimensions in both CNNs and transformers (e.g. kernel size, embedding dimension, number of heads, etc.). Instead of adaptively selecting important weight elements in a sparse way, we pre-define dense weight slices with different importance level by nested residual learning. During inference, weights are progressively sliced beginning with the most important elements to less important ones to achieve different model capacity for inputs with diverse difficulty levels. Based on this conception, we present DS-CNN++ and DS-ViT++, by carefully designing the double headed dynamic gate and the overall network architecture. We further propose dynamic idle slicing to address the drastic reduction of embedding dimension in DS-ViT++. To ensure sub-network generality and routing fairness, we propose a disentangled two-stage optimization scheme. In Stage I, in-place bootstrapping (IB) and multi-view consistency (MvCo) are proposed to stablize and improve the training of DS-CNN++ and DS-ViT++ supernet, respectively. In Stage II, sandwich gate sparsification (SGS) is proposed to assist the gate training. Extensive experiments on 4 datasets and 3 different network architectures demonstrate our methods consistently outperform the state-of-the-art static and dynamic model compression methods by a large margin (up to 6.6%). Typically, we achieves 2-4× computation reduction and up to 61.5% real-world acceleration on MobileNet, ResNet-50 and Vision Transformer, with minimal accuracy drops on ImageNet. Code release: https://github.com/changlin31/DS-Net.
动态网络通过在推理过程中自适应地调整其架构以适应输入,显示出了降低理论计算复杂度的潜力。然而,由于效率低下的稀疏性,它们的实际运行时间通常落后于理论加速。在本文中,我们探索了一种硬件高效的动态推理模式,称为动态权重切片,它可以在 CNN 和 Transformer 的多个维度上进行很好的泛化(例如,核大小、嵌入维度、头的数量等)。我们不是以稀疏的方式自适应地选择重要的权重元素,而是通过嵌套残差学习预先定义具有不同重要性级别的密集权重切片。在推理过程中,从最重要的元素开始,逐渐对权重进行切片,直到不太重要的元素,从而为具有不同难度级别的输入实现不同的模型容量。基于这个概念,我们提出了 DS-CNN++ 和 DS-ViT++,通过精心设计双头动态门和整体网络架构。我们进一步提出动态空闲切片来解决 DS-ViT++中嵌入维度的急剧减少。为了确保子网络的通用性和路由公平性,我们提出了一种解耦的两阶段优化方案。在第一阶段,我们提出了原地自举(IB)和多视图一致性(MvCo),分别用于稳定和改进 DS-CNN++和 DS-ViT++超网的训练。在第二阶段,提出了夹心门稀疏化(SGS)来辅助门的训练。在 4 个数据集和 3 种不同的网络架构上的广泛实验表明,我们的方法明显优于最先进的静态和动态模型压缩方法,有 6.6%的巨大优势。通常,我们在 MobileNet、ResNet-50 和 Vision Transformer 上实现了 2-4 倍的计算减少和高达 61.5%的实际加速,在 ImageNet 上的精度下降最小。代码发布:https://github.com/changlin31/DS-Net。