Luo Ping, Zhang Ruimao, Ren Jiamin, Peng Zhanglin, Li Jingyu
IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):712-728. doi: 10.1109/TPAMI.2019.2932062. Epub 2021 Jan 8.
We address a learning-to-normalize problem by proposing Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network. SN employs three distinct scopes to compute statistics (means and variances) including a channel, a layer, and a minibatch. SN switches between them by learning their importance weights in an end-to-end manner. It has several good properties. First, it adapts to various network architectures and tasks (see Fig. 1). Second, it is robust to a wide range of batch sizes, maintaining high performance even when small minibatch is presented (e.g., 2 images/GPU). Third, SN does not have sensitive hyper-parameter, unlike group normalization that searches the number of groups as a hyper-parameter. Without bells and whistles, SN outperforms its counterparts on various challenging benchmarks, such as ImageNet, COCO, CityScapes, ADE20K, MegaFace and Kinetics. Analyses of SN are also presented to answer the following three questions: (a) Is it useful to allow each normalization layer to select its own normalizer? (b) What impacts the choices of normalizers? (c) Do different tasks and datasets prefer different normalizers? We hope SN will help ease the usage and understand the normalization techniques in deep learning. The code of SN has been released at https://github.com/switchablenorms.
我们通过提出可切换归一化(SN)来解决学习归一化问题,它能为深度神经网络的不同归一化层学习选择不同的归一化方法。SN采用三种不同范围来计算统计量(均值和方差),包括通道、层和小批次。SN通过端到端学习它们的重要性权重在这些范围之间进行切换。它具有几个良好的特性。首先,它能适应各种网络架构和任务(见图1)。其次,它对广泛的批次大小具有鲁棒性,即使在小批次(例如,2张图像/ GPU)的情况下也能保持高性能。第三,与将组数量作为超参数进行搜索的组归一化不同,SN没有敏感的超参数。在没有花里胡哨的东西的情况下,SN在各种具有挑战性的基准测试中,如图像网、COCO、城市景观、ADE20K、MegaFace和动力学等,都优于其同类方法。还对SN进行了分析,以回答以下三个问题:(a)允许每个归一化层选择自己的归一化方法是否有用?(b)什么影响归一化方法的选择?(c)不同的任务和数据集是否更喜欢不同的归一化方法?我们希望SN将有助于简化深度学习中归一化技术的使用并增进对其的理解。SN的代码已在https://github.com/switchablenorms上发布。