DS-Net++：用于 CNN 和视觉转换器中高效推理的动态权重切片。

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4430-4446. doi: 10.1109/TPAMI.2022.3194044. Epub 2023 Mar 7.

DOI:10.1109/TPAMI.2022.3194044

Abstract

Dynamic networks have shown their promising capability in reducing theoretical computation complexity by adapting their architectures to the input during inference. However, their practical runtime usually lags behind the theoretical acceleration due to inefficient sparsity. In this paper, we explore a hardware-efficient dynamic inference regime, named dynamic weight slicing, that can generalized well on multiple dimensions in both CNNs and transformers (e.g. kernel size, embedding dimension, number of heads, etc.). Instead of adaptively selecting important weight elements in a sparse way, we pre-define dense weight slices with different importance level by nested residual learning. During inference, weights are progressively sliced beginning with the most important elements to less important ones to achieve different model capacity for inputs with diverse difficulty levels. Based on this conception, we present DS-CNN++ and DS-ViT++, by carefully designing the double headed dynamic gate and the overall network architecture. We further propose dynamic idle slicing to address the drastic reduction of embedding dimension in DS-ViT++. To ensure sub-network generality and routing fairness, we propose a disentangled two-stage optimization scheme. In Stage I, in-place bootstrapping (IB) and multi-view consistency (MvCo) are proposed to stablize and improve the training of DS-CNN++ and DS-ViT++ supernet, respectively. In Stage II, sandwich gate sparsification (SGS) is proposed to assist the gate training. Extensive experiments on 4 datasets and 3 different network architectures demonstrate our methods consistently outperform the state-of-the-art static and dynamic model compression methods by a large margin (up to 6.6%). Typically, we achieves 2-4× computation reduction and up to 61.5% real-world acceleration on MobileNet, ResNet-50 and Vision Transformer, with minimal accuracy drops on ImageNet. Code release: https://github.com/changlin31/DS-Net.

摘要

动态网络通过在推理过程中自适应地调整其架构以适应输入，显示出了降低理论计算复杂度的潜力。然而，由于效率低下的稀疏性，它们的实际运行时间通常落后于理论加速。在本文中，我们探索了一种硬件高效的动态推理模式，称为动态权重切片，它可以在 CNN 和 Transformer 的多个维度上进行很好的泛化（例如，核大小、嵌入维度、头的数量等）。我们不是以稀疏的方式自适应地选择重要的权重元素，而是通过嵌套残差学习预先定义具有不同重要性级别的密集权重切片。在推理过程中，从最重要的元素开始，逐渐对权重进行切片，直到不太重要的元素，从而为具有不同难度级别的输入实现不同的模型容量。基于这个概念，我们提出了 DS-CNN++ 和 DS-ViT++，通过精心设计双头动态门和整体网络架构。我们进一步提出动态空闲切片来解决 DS-ViT++中嵌入维度的急剧减少。为了确保子网络的通用性和路由公平性，我们提出了一种解耦的两阶段优化方案。在第一阶段，我们提出了原地自举（IB）和多视图一致性（MvCo），分别用于稳定和改进 DS-CNN++和 DS-ViT++超网的训练。在第二阶段，提出了夹心门稀疏化（SGS）来辅助门的训练。在 4 个数据集和 3 种不同的网络架构上的广泛实验表明，我们的方法明显优于最先进的静态和动态模型压缩方法，有 6.6%的巨大优势。通常，我们在 MobileNet、ResNet-50 和 Vision Transformer 上实现了 2-4 倍的计算减少和高达 61.5%的实际加速，在 ImageNet 上的精度下降最小。代码发布：https://github.com/changlin31/DS-Net。

相似文献

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers.DS-Net++：用于 CNN 和视觉转换器中高效推理的动态权重切片。

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4430-4446. doi: 10.1109/TPAMI.2022.3194044. Epub 2023 Mar 7.

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks.用于高效视觉Transformer和卷积神经网络的动态空间稀疏化

IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):10883-10897. doi: 10.1109/TPAMI.2023.3263826. Epub 2023 Aug 7.

Deep learning for mango leaf disease identification: A vision transformer perspective.用于芒果叶病识别的深度学习：视觉Transformer视角

Heliyon. 2024 Aug 22;10(17):e36361. doi: 10.1016/j.heliyon.2024.e36361. eCollection 2024 Sep 15.

Dynamic Slimmable Denoising Network.动态可瘦身去噪网络

IEEE Trans Image Process. 2023;32:1583-1598. doi: 10.1109/TIP.2023.3246792. Epub 2023 Mar 6.

Weak sub-network pruning for strong and efficient neural networks.弱子网络剪枝技术：构建强大而高效的神经网络

Neural Netw. 2021 Dec;144:614-626. doi: 10.1016/j.neunet.2021.09.015. Epub 2021 Sep 30.

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT：基于轻量级视觉Transformer 的实时单目深度估计。

Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.

A Progressive Subnetwork Searching Framework for Dynamic Inference.一种用于动态推理的渐进子网搜索框架。

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3809-3820. doi: 10.1109/TNNLS.2022.3199703. Epub 2024 Feb 29.

SmartDeal: Remodeling Deep Network Weights for Efficient Inference and Training.智能交易：重塑深度网络权重以实现高效推理与训练

IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7099-7113. doi: 10.1109/TNNLS.2021.3138056. Epub 2023 Oct 5.

Multi-tailed vision transformer for efficient inference.多尾视觉转换器，用于高效推理。

Neural Netw. 2024 Jun;174:106235. doi: 10.1016/j.neunet.2024.106235. Epub 2024 Mar 14.

High-Performance Acceleration of 2-D and 3-D CNNs on FPGAs Using Static Block Floating Point.使用静态块浮点在现场可编程门阵列上对二维和三维卷积神经网络进行高性能加速。

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4473-4487. doi: 10.1109/TNNLS.2021.3116302. Epub 2023 Aug 4.

引用本文的文献

Pest detection in dynamic environments: an adaptive continual test-time domain adaptation strategy.动态环境中的害虫检测：一种自适应持续测试时域自适应策略。

Plant Methods. 2025 Apr 23;21(1):53. doi: 10.1186/s13007-025-01371-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DS-Net++：用于 CNN 和视觉转换器中高效推理的动态权重切片。

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献