Suppr超能文献

通过分离批归一化模型中的变化模式实现更快训练。

Training Faster by Separating Modes of Variation in Batch-Normalized Models.

作者信息

Kalayeh Mahdi M, Shah Mubarak

出版信息

IEEE Trans Pattern Anal Mach Intell. 2020 Jun;42(6):1483-1500. doi: 10.1109/TPAMI.2019.2895781. Epub 2019 Jan 28.

Abstract

Batch Normalization (BN) is essential to effectively train state-of-the-art deep Convolutional Neural Networks (CNN). It normalizes the layer outputs during training using the statistics of each mini-batch. BN accelerates training procedure by allowing to safely utilize large learning rates and alleviates the need for careful initialization of the parameters. In this work, we study BN from the viewpoint of Fisher kernels that arise from generative probability models. We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution. That means batch normalizing transform can be explained in terms of kernels that naturally emerge from the probability density function that models the generative process of the underlying data distribution. Consequently, it promises higher discrimination power for the batch-normalized mini-batch. However, given the rectifying non-linearities employed in CNN architectures, distribution of the layer outputs show an asymmetric characteristic. Therefore, in order for BN to fully benefit from the aforementioned properties, we propose approximating underlying data distribution not with one, but a mixture of Gaussian densities. Deriving Fisher vector for a Gaussian Mixture Model (GMM), reveals that batch normalization can be improved by independently normalizing with respect to the statistics of disentangled sub-populations. We refer to our proposed soft piecewise version of batch normalization as Mixture Normalization (MN). Through extensive set of experiments on CIFAR-10 and CIFAR-100, using both a 5-layers deep CNN and modern Inception-V3 architecture, we show that mixture normalization reduces required number of gradient updates to reach the maximum test accuracy of the batch-normalized model by  ∼ 31%-47% across a variety of training scenarios. Replacing even a few BN modules with MN in the 48-layers deep Inception-V3 architecture is sufficient to not only obtain considerable training acceleration but also better final test accuracy. We show that similar observations are valid for 40 and 100-layers deep DenseNet architectures as well. We complement our study by evaluating the application of mixture normalization to the Generative Adversarial Networks (GANs), where "mode collapse" hinders the training process. We solely replace a few batch normalization layers in the generator with our proposed mixture normalization. Our experiments using Deep Convolutional GAN (DCGAN) on CIFAR-10 show that mixture-normalized DCGAN not only provides an acceleration of  ∼ 58% but also reaches lower (better) "Fréchet Inception Distance" (FID) of 33.35 compared to 37.56 of its batch-normalized counterpart.

摘要

批量归一化(BN)对于有效训练最先进的深度卷积神经网络(CNN)至关重要。它在训练期间使用每个小批量的统计信息对层输出进行归一化。BN通过允许安全地使用大学习率来加速训练过程,并减轻了对参数进行仔细初始化的需求。在这项工作中,我们从生成概率模型产生的Fisher核的角度研究BN。我们表明,假设一个小批量内的样本来自相同的概率密度函数,那么BN与高斯分布的Fisher向量相同。这意味着批量归一化变换可以根据从对基础数据分布的生成过程进行建模的概率密度函数自然产生的核来解释。因此,它有望为批量归一化的小批量提供更高的辨别力。然而,考虑到CNN架构中使用的整流非线性,层输出的分布呈现出不对称特征。因此,为了使BN充分受益于上述特性,我们建议用高斯密度的混合而不是单一的高斯密度来近似基础数据分布。推导高斯混合模型(GMM)的Fisher向量表明,可以通过相对于解缠的子群体的统计信息独立进行归一化来改进批量归一化。我们将我们提出的批量归一化的软分段版本称为混合归一化(MN)。通过在CIFAR - 10和CIFAR - 100上进行的大量实验,使用一个5层深度的CNN和现代的Inception - V3架构,我们表明在各种训练场景下,混合归一化将达到批量归一化模型最大测试准确率所需的梯度更新次数减少了约31% - 47%。在48层深度的Inception - V3架构中,即使只用MN替换几个BN模块,不仅足以获得可观的训练加速,还能获得更好的最终测试准确率。我们表明,类似的观察结果对于40层和100层深度的DenseNet架构也有效。我们通过评估混合归一化在生成对抗网络(GANs)中的应用来补充我们的研究,在GAN中“模式崩溃”阻碍了训练过程。我们仅用我们提出的混合归一化替换生成器中的几个批量归一化层。我们在CIFAR - 10上使用深度卷积GAN(DCGAN)进行的实验表明,混合归一化的DCGAN不仅提供了约58%的加速,而且与批量归一化的对应模型相比,达到了更低(更好)的“弗雷歇因ception距离”(FID),分别为33.35和37.56。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验