IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):2891-2904. doi: 10.1109/TPAMI.2020.3020300. Epub 2021 Aug 4.
Recently neural architecture search (NAS) has raised great interest in both academia and industry. However, it remains challenging because of its huge and non-continuous search space. Instead of applying evolutionary algorithm or reinforcement learning as previous works, this paper proposes a direct sparse optimization NAS (DSO-NAS) method. The motivation behind DSO-NAS is to address the task in the view of model pruning. To achieve this goal, we start from a completely connected block, and then introduce scaling factors to scale the information flow between operations. Next, sparse regularizations are imposed to prune useless connections in the architecture. Lastly, an efficient and theoretically sound optimization method is derived to solve it. Our method enjoys both advantages of differentiability and efficiency, therefore it can be directly applied to large datasets like ImageNet and tasks beyond classification. Particularly, on the CIFAR-10 dataset, DSO-NAS achieves an average test error 2.74 percent, while on the ImageNet dataset DSO-NAS achieves 25.4 percent test error under 600M FLOPs with 8 GPUs in 18 hours. As for semantic segmentation task, DSO-NAS also achieve competitive result compared with manually designed architectures on the PASCAL VOC dataset. Code is available at https://github.com/XinbangZhang/DSO-NAS.
最近,神经架构搜索(NAS)在学术界和工业界都引起了极大的兴趣。然而,由于其巨大的、不连续的搜索空间,它仍然具有挑战性。本文提出了一种直接稀疏优化 NAS(DSO-NAS)方法,而不是像以前的工作那样应用进化算法或强化学习。DSO-NAS 的动机是从模型剪枝的角度来解决这个任务。为了实现这一目标,我们从一个完全连接的块开始,然后引入缩放因子来缩放操作之间的信息流。接下来,施加稀疏正则化来剪枝架构中的无用连接。最后,导出了一种高效且理论上合理的优化方法来解决这个问题。我们的方法既具有可区分性又具有效率,因此可以直接应用于像 ImageNet 这样的大型数据集和分类任务之外的任务。特别是,在 CIFAR-10 数据集上,DSO-NAS 的平均测试误差为 2.74%,而在 ImageNet 数据集上,在 8 个 GPU 上用 6 亿 FLOPs 在 18 小时内达到 25.4%的测试误差。对于语义分割任务,DSO-NAS 与在 PASCAL VOC 数据集上手动设计的架构相比也取得了有竞争力的结果。代码可在 https://github.com/XinbangZhang/DSO-NAS 上获得。