Suppr超能文献

通过神经结构转换实现精确且紧凑的架构。

Towards Accurate and Compact Architectures via Neural Architecture Transformer.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6501-6516. doi: 10.1109/TPAMI.2021.3086914. Epub 2022 Sep 14.

Abstract

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-designed/searched architecture may still contain many nonsignificant or redundant modules/operations (e.g., some intermediate convolution or pooling layers). Such redundancy may not only incur substantial memory consumption and computational cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost. To this end, we have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP) and seeks to replace the redundant operations with more efficient operations, such as skip or null connection. Note that NAT only considers a small number of possible replacements/transitions and thus comes with a limited search space. As a result, such a small search space may hamper the performance of architecture optimization. To address this issue, we propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization. Specifically, we present a two-level transition rule to obtain valid transitions, i.e., allowing operations to have more efficient types (e.g., convolution → separable convolution) or smaller kernel sizes (e.g., 5×5 → 3×3). Note that different operations may have different valid transitions. We further propose a Binary-Masked Softmax (BMSoftmax) layer to omit the possible invalid transitions. Last, based on the MDP formulation, we apply policy gradient to learn an optimal policy, which will be used to infer the optimized architectures. Extensive experiments show that the transformed architectures significantly outperform both their original counterparts and the architectures optimized by existing methods.

摘要

设计有效的架构是深度学习网络成功的关键因素之一。现有的深度架构要么是由一些神经架构搜索(NAS)方法手动设计,要么是自动搜索得到的。然而,即使是精心设计/搜索的架构仍然可能包含许多非重要或冗余的模块/操作(例如,一些中间卷积或池化层)。这种冗余不仅会导致大量的内存消耗和计算成本,而且还会降低性能。因此,有必要优化架构中的操作,在不引入额外计算成本的情况下提高性能。为此,我们提出了一种神经架构转换器(NAT)方法,将优化问题转化为马尔可夫决策过程(MDP),并寻求用更有效的操作(例如跳过或空连接)替换冗余操作。请注意,NAT 只考虑了少量可能的替换/转换,因此搜索空间有限。因此,如此小的搜索空间可能会阻碍架构优化的性能。为了解决这个问题,我们提出了一种神经架构转换器++(NAT++)方法,进一步扩大了候选转换的集合,以提高架构优化的性能。具体来说,我们提出了一种两级转换规则来获得有效的转换,即允许操作具有更有效的类型(例如,卷积→可分离卷积)或更小的核大小(例如,5×5→3×3)。请注意,不同的操作可能具有不同的有效转换。我们进一步提出了一种二进制掩蔽 Softmax(BMSoftmax)层来忽略可能的无效转换。最后,基于 MDP 公式,我们应用策略梯度来学习最优策略,该策略将用于推断优化后的架构。大量实验表明,转换后的架构明显优于其原始架构和现有方法优化的架构。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验