Arch-Net：用于与架构无关的模型部署的模型转换和量化。

Arch-Net: Model conversion and quantization for architecture agnostic model deployment.

作者信息

Fang Shuangkang, Xu Weixin, Feng Zipeng, Yuan Song, Wang Yufeng, Yang Yi, Ding Wenrui, Zhou Shuchang

机构信息

School of Electrical and Information Engineering, Beihang University, Beijing, 100191, China.

Megvii Research, Megvii Inc., Bejing, 100096, China.

出版信息

Neural Netw. 2025 Jul;187:107384. doi: 10.1016/j.neunet.2025.107384. Epub 2025 Mar 18.

DOI:10.1016/j.neunet.2025.107384

PMID:40120552

Abstract

The significant computational demands of Deep Neural Networks (DNNs) present a major challenge for their practical application. Recently, many Application-Specific Integrated Circuit (ASIC) chips have incorporated dedicated hardware support for neural network acceleration. However, the lengthy development cycle of ASIC chips means they often lag behind the latest advances in neural architecture research. For instance, Layer Normalization is not well-supported on many popular chips, and the efficiency of 7 × 7 convolution is significantly lower than the equivalent three 3 × 3 convolution. Therefore, in this paper, we introduce Arch-Net, a neural network framework comprised exclusively of a select few common operators, namely 3 × 3 Convolution, 2 × 2 Max-pooling, Batch Normalization, Fully Connected layers, and Concatenation, which are efficiently supported across the majority of ASIC architectures. To facilitate the conversion of disparate network architectures into Arch-Net, we propose the Arch-Distillation methodology, which incorporates strategies such as Residual Feature Adaptation and Teacher Attention Mechanism. These mechanisms enable effective conversion between different network structures alongside efficient model quantization. The resultant Arch-Net eliminates unconventional network constructs while maintaining robust performance even under sub-8-bit quantization, thereby enhancing compatibility and deployment efficiency. Empirical results from image classification and machine translation tasks demonstrate that using only a few types of operators in Arch-Net can achieve results comparable to those obtained with complex architectures. This provides a new insight for deploying structure-agnostic neural networks on various ASIC chips.

摘要

深度神经网络（DNN）巨大的计算需求对其实际应用构成了重大挑战。最近，许多专用集成电路（ASIC）芯片都集成了对神经网络加速的专用硬件支持。然而，ASIC芯片漫长的开发周期意味着它们往往落后于神经架构研究的最新进展。例如，许多流行芯片对层归一化（Layer Normalization）的支持并不好，并且7×7卷积的效率明显低于等效的三个3×3卷积。因此，在本文中，我们引入了Arch-Net，这是一种神经网络框架，仅由少数几个常见算子组成，即3×3卷积、2×2最大池化、批量归一化、全连接层和拼接，这些算子在大多数ASIC架构中都得到了有效支持。为了便于将不同的网络架构转换为Arch-Net，我们提出了Arch-Distillation方法，该方法纳入了诸如残差特征适应和教师注意力机制等策略。这些机制能够在不同网络结构之间进行有效转换，并实现高效的模型量化。由此产生的Arch-Net消除了非常规的网络结构，同时即使在低于8位量化的情况下也能保持强大的性能，从而提高了兼容性和部署效率。图像分类和机器翻译任务的实证结果表明，在Arch-Net中仅使用几种类型的算子就能取得与复杂架构相当的结果。这为在各种ASIC芯片上部署与结构无关的神经网络提供了新的见解。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

Arch-Net：用于与架构无关的模型部署的模型转换和量化。

Arch-Net: Model conversion and quantization for architecture agnostic model deployment.

作者信息

机构信息

出版信息

相似文献

Arch-Net：用于与架构无关的模型部署的模型转换和量化。

Arch-Net: Model conversion and quantization for architecture agnostic model deployment.

作者信息

机构信息

出版信息

相似文献