Sun Qigong, Li Xiufang, Jiao Licheng, Ren Yan, Shang Fanhua, Liu Fang
IEEE Trans Cybern. 2023 Oct;53(10):6187-6199. doi: 10.1109/TCYB.2022.3164285. Epub 2023 Sep 15.
Model quantization can reduce the model size and computational latency, it has been successfully applied for many applications of mobile phones, embedded devices, and smart chips. Mixed-precision quantization models can match different bit precision according to the sensitivity of different layers to achieve great performance. However, it is difficult to quickly determine the quantization bit precision of each layer in deep neural networks under some constraints (for example, hardware resources, energy consumption, model size, and computational latency). In this article, a novel sequential single-path search (SSPS) method for mixed-precision model quantization is proposed, in which some given constraints are introduced to guide the searching process. A single-path search cell is proposed to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of the searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (for example, ResNet-20, 18, 34, 50, and MobileNet-V2) and datasets (for example, CIFAR-10, ImageNet, and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform-precision counterparts.
模型量化可以减小模型大小并降低计算延迟,它已成功应用于手机、嵌入式设备和智能芯片的许多应用中。混合精度量化模型可以根据不同层的敏感度匹配不同的比特精度,以实现出色的性能。然而,在某些约束条件下(例如硬件资源、能耗、模型大小和计算延迟),很难快速确定深度神经网络中每层的量化比特精度。在本文中,提出了一种用于混合精度模型量化的新颖的顺序单路径搜索(SSPS)方法,其中引入了一些给定的约束来指导搜索过程。提出了一种单路径搜索单元来结合一个完全可微的超网络,该超网络可以通过基于梯度的算法进行优化。此外,我们根据选择确定性顺序确定候选精度,以指数方式减少搜索空间并加快搜索过程的收敛。实验表明,我们的方法可以在给定约束下有效地为不同架构(例如ResNet-20、18、34、50和MobileNet-V2)和数据集(例如CIFAR-10、ImageNet和COCO)搜索混合精度模型,并且我们的实验结果验证了SSPS明显优于其均匀精度的对应方法。