IEEE Trans Neural Netw Learn Syst. 2019 Jul;30(7):2043-2051. doi: 10.1109/TNNLS.2018.2876179. Epub 2018 Nov 9.
Batch normalization (BN) has recently become a standard component for accelerating and improving the training of deep neural networks (DNNs). However, BN brings in additional calculations, consumes more memory, and significantly slows down the training iteration. Furthermore, the nonlinear square and sqrt operations in the normalization process impede low bit-width quantization techniques, which draw much attention to the deep learning hardware community. In this paper, we propose an L1 -norm BN (L1BN) with only linear operations in both forward and backward propagations during training. L1BN is approximately equivalent to the conventional L2 -norm BN (L2BN) by multiplying a scaling factor that equals (π/2) . Experiments on various convolutional neural networks and generative adversarial networks reveal that L1BN can maintain the same performance and convergence rate as L2BN but with higher computational efficiency. In real application-specified integrated circuit synthesis with reduced resources, L1BN achieves 25% speedup and 37% energy saving compared to the original L2BN. Our hardware-friendly normalization method not only surpasses L2BN in speed but also simplifies the design of deep learning accelerators. Last but not least, L1BN promises a fully quantized training of DNNs, which empowers future artificial intelligence applications on mobile devices with transfer and continual learning capability.
批量归一化(BN)最近已成为加速和改进深度神经网络(DNN)训练的标准组件。然而,BN 带来了额外的计算量,消耗了更多的内存,并且大大降低了训练迭代速度。此外,归一化过程中的非线性平方和平方根运算阻碍了低比特宽度量化技术的发展,这引起了深度学习硬件社区的广泛关注。在本文中,我们提出了一种 L1-范数 BN(L1BN),在训练过程中前向和后向传播都只有线性运算。L1BN 通过乘以一个缩放因子(等于 (π/2)),与传统的 L2-范数 BN(L2BN)近似等效。在各种卷积神经网络和生成对抗网络上的实验表明,L1BN 可以保持与 L2BN 相同的性能和收敛速度,但计算效率更高。在使用减少资源的专用集成电路综合中,L1BN 与原始的 L2BN 相比,速度提高了 25%,能耗降低了 37%。我们的硬件友好型归一化方法不仅在速度上超过了 L2BN,而且简化了深度学习加速器的设计。最后但同样重要的是,L1BN 可以实现 DNN 的完全量化训练,这为未来具有迁移和持续学习能力的移动设备上的人工智能应用提供了支持。