IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):2921-2935. doi: 10.1109/TPAMI.2020.3035351. Epub 2021 Aug 4.
One-shot neural architecture search (NAS) has recently become mainstream in the NAS community because it significantly improves computational efficiency through weight sharing. However, the supernet training paradigm in one-shot NAS introduces catastrophic forgetting, where each step of the training can deteriorate the performance of other architectures that contain partially-shared weights with current architecture. To overcome this problem of catastrophic forgetting, we formulate supernet training for one-shot NAS as a constrained continual learning optimization problem such that learning the current architecture does not degrade the validation accuracy of previous architectures. The key to solving this constrained optimization problem is a novelty search based architecture selection (NSAS) loss function that regularizes the supernet training by using a greedy novelty search method to find the most representative subset. We applied the NSAS loss function to two one-shot NAS baselines and extensively tested them on both a common search space and a NAS benchmark dataset. We further derive three variants based on the NSAS loss function, the NSAS with depth constrain (NSAS-C) to improve the transferability, and NSAS-G and NSAS-LG to handle the situation with a limited number of constraints. The experiments on the common NAS search space demonstrate that NSAS and it variants improve the predictive ability of supernet training in one-shot NAS with remarkable and efficient performance on the CIFAR-10, CIFAR-100, and ImageNet datasets. The results with the NAS benchmark dataset also confirm the significant improvements these one-shot NAS baselines can make.
单次神经架构搜索 (NAS) 最近在 NAS 社区中成为主流,因为它通过权重共享显著提高了计算效率。然而,单次 NAS 中的超网训练范例引入了灾难性遗忘,其中训练的每一步都可能降低与当前架构部分共享权重的其他架构的性能。为了克服这种灾难性遗忘问题,我们将单次 NAS 中的超网训练表述为一个受约束的持续学习优化问题,以便学习当前架构不会降低先前架构的验证准确性。解决这个受约束的优化问题的关键是基于新颖性搜索的架构选择 (NSAS) 损失函数,该函数通过使用贪婪新颖性搜索方法找到最具代表性的子集来正则化超网训练。我们将 NSAS 损失函数应用于两个单次 NAS 基线,并在常见搜索空间和 NAS 基准数据集上对它们进行了广泛测试。我们进一步基于 NSAS 损失函数推导出了三个变体,即具有深度约束的 NSAS (NSAS-C),以提高可转移性,以及 NSAS-G 和 NSAS-LG,以处理约束数量有限的情况。在常见的 NAS 搜索空间上的实验表明,NSAS 及其变体提高了单次 NAS 中超网训练的预测能力,在 CIFAR-10、CIFAR-100 和 ImageNet 数据集上具有显著且高效的性能。在 NAS 基准数据集上的结果也证实了这些单次 NAS 基线可以显著提高性能。