School of Computer Science and Technology, Xidian University, Xi'an, 710071, China; The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi'an, 710071, China.
Neural Netw. 2023 Oct;167:787-797. doi: 10.1016/j.neunet.2023.08.053. Epub 2023 Sep 4.
Designing efficient and accurate network architectures to support various workloads, from servers to edge devices, is a fundamental problem as the use of Convolutional Neural Networks (ConvNets) becomes increasingly widespread. One simple yet effective method is to scale ConvNets by systematically adjusting the dimensions of the baseline network, including width, depth, and resolution, enabling it to adapt to diverse workloads by varying its computational complexity and representation ability. However, current state-of-the-art (SOTA) scaling methods for neural network architectures overlook the inter-dimensional relationships within the network and the impact of scaling on inference speed, resulting in suboptimal trade-offs between accuracy and inference speed. To overcome those limitations, we propose a scaling method for ConvNets that utilizes dimension relationship and runtime proxy constraints to improve accuracy and inference speed. Specifically, our research notes that higher input resolutions in convolutional layers lead to redundant filters (convolutional width) due to increased similarity between information in different positions, suggesting a potential benefit in reducing filters while increasing input resolution. Based on this observation, the relationship between the width and resolution is empirically quantified in our work, enabling models with higher parametric efficiency to be prioritized through our scaling strategy. Furthermore, we introduce a novel runtime prediction model that focuses on fine-grained layer tasks with different computational properties for more accurate identification of efficient network configurations. Comprehensive experiments show that our method outperforms prior works in creating a set of models with a trade-off between accuracy and inference speed on the ImageNet datasets for various ConvNets.
设计高效准确的网络架构以支持各种工作负载,从服务器到边缘设备,是一个基本问题,因为卷积神经网络(ConvNets)的使用越来越广泛。一种简单而有效的方法是通过系统地调整基线网络的维度(包括宽度、深度和分辨率)来扩展 ConvNets,从而通过改变其计算复杂度和表示能力来适应各种工作负载。然而,当前神经网络架构的最先进(SOTA)扩展方法忽略了网络内部的维间关系以及扩展对推理速度的影响,导致在准确性和推理速度之间的权衡不理想。为了克服这些限制,我们提出了一种用于 ConvNets 的扩展方法,该方法利用维度关系和运行时代理约束来提高准确性和推理速度。具体来说,我们的研究注意到卷积层中的较高输入分辨率由于不同位置之间的信息相似性增加而导致冗余滤波器(卷积宽度),这表明在增加输入分辨率的同时减少滤波器可能会带来好处。基于这一观察,我们在工作中对宽度和分辨率之间的关系进行了经验量化,使我们的扩展策略能够优先考虑具有更高参数效率的模型。此外,我们引入了一种新的运行时预测模型,该模型专注于具有不同计算特性的细粒度层任务,以更准确地识别有效的网络配置。全面的实验表明,我们的方法在为各种 ConvNets 在 ImageNet 数据集上创建一组在准确性和推理速度之间具有权衡的模型方面优于现有工作。