Ma Yintai, Klabjan Diego
IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6544-6557. doi: 10.1109/TNNLS.2022.3210840. Epub 2024 May 2.
In this article, we propose a generalization of the batch normalization (BN) algorithm, diminishing BN (DBN), where we update the BN parameters in a diminishing moving average way. BN is very effective in accelerating the convergence of a neural network training phase that it has become a common practice. Our proposed DBN algorithm retains the overall structure of the original BN algorithm while introducing a weighted averaging update to some trainable parameters. We provide an analysis of the convergence of the DBN algorithm that converges to a stationary point with respect to the trainable parameters. Our analysis can be easily generalized to the original BN algorithm by setting some parameters to constant. To the best of our knowledge, this analysis is the first of its kind for convergence with BN. We analyze a two-layer model with arbitrary activation functions. Common activation functions, such as ReLU and any smooth activation functions, meet our assumptions. In the numerical experiments, we test the proposed algorithm on complex modern CNN models with stochastic gradients (SGs) and ReLU activation on regression, classification, and image reconstruction tasks. We observe that DBN outperforms the original BN algorithm and benchmark layer normalization (LN) on the MNIST, NI, CIFAR-10, CIFAR-100, and Caltech-UCSD Birds-200-2011 datasets with modern complex CNN models such as Resnet-18 and typical FNN models.
在本文中,我们提出了批归一化(BN)算法的一种推广形式,即递减批归一化(DBN),其中我们以递减移动平均的方式更新BN参数。BN在加速神经网络训练阶段的收敛方面非常有效,以至于它已成为一种常见做法。我们提出的DBN算法保留了原始BN算法的整体结构,同时对一些可训练参数引入了加权平均更新。我们对DBN算法的收敛性进行了分析,该算法相对于可训练参数收敛到一个稳定点。通过将一些参数设置为常数,我们的分析可以很容易地推广到原始BN算法。据我们所知,这种分析是关于BN收敛性的首次此类分析。我们分析了具有任意激活函数的两层模型。常见的激活函数,如ReLU和任何平滑激活函数,都符合我们的假设。在数值实验中,我们在具有随机梯度(SGs)和ReLU激活的复杂现代卷积神经网络(CNN)模型上,对回归、分类和图像重建任务测试了所提出的算法。我们观察到,在MNIST、NI、CIFAR - 10、CIFAR - 100和加州理工学院 - 加州大学圣地亚哥分校鸟类 - 200 - 2011数据集上,使用诸如Resnet - 18等现代复杂CNN模型和典型的前馈神经网络(FNN)模型时,DBN优于原始BN算法和基准层归一化(LN)。