利用少量不相关数据进行统计估计的高效有效批归一化

Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation.

作者信息

Chen Zhaodong, Deng Lei, Li Guoqi, Sun Jiawei, Hu Xing, Liang Ling, Ding Yufei, Xie Yuan

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):348-362. doi: 10.1109/TNNLS.2020.2978753. Epub 2021 Jan 4.

DOI:10.1109/TNNLS.2020.2978753

Abstract

Deep neural networks (DNNs) thrive in recent years, wherein batch normalization (BN) plays an indispensable role. However, it has been observed that BN is costly due to the huge reduction and elementwise operations that are hard to be executed in parallel, which heavily reduces the training speed. To address this issue, in this article, we propose a methodology to alleviate the BN's cost by using only a few sampled or generated data for mean and variance estimation at each iteration. The key challenge to reach this goal is how to achieve a satisfactory balance between normalization effectiveness and execution efficiency. We identify that the effectiveness expects less data correlation in sampling while the efficiency expects more regular execution patterns. To this end, we design two categories of approach: sampling or creating a few uncorrelated data for statistics' estimation with certain strategy constraints. The former includes "batch sampling (BS)" that randomly selects a few samples from each batch and "feature sampling (FS)" that randomly selects a small patch from each feature map of all samples, and the latter is "virtual data set normalization (VDN)" that generates a few synthetic random samples to directly create uncorrelated data for statistics' estimation. Accordingly, multiway strategies are designed to reduce the data correlation for accurate estimation and optimize the execution pattern for running acceleration in the meantime. The proposed methods are comprehensively evaluated on various DNN models, where the loss of model accuracy and the convergence rate are negligible. Without the support of any specialized libraries, 1.98× BN layer acceleration and 23.2% overall training speedup can be practically achieved on modern GPUs. Furthermore, our methods demonstrate powerful performance when solving the well-known "micro-BN" problem in the case of a tiny batch size. This article provides a promising solution for the efficient training of high-performance DNNs.

摘要

近年来，深度神经网络（DNN）蓬勃发展，其中批量归一化（BN）发挥着不可或缺的作用。然而，据观察，由于巨大的缩减和难以并行执行的逐元素操作，BN成本高昂，这严重降低了训练速度。为了解决这个问题，在本文中，我们提出了一种方法，通过在每次迭代中仅使用少量采样或生成的数据进行均值和方差估计，来减轻BN的成本。实现这一目标的关键挑战在于如何在归一化有效性和执行效率之间取得令人满意的平衡。我们发现，有效性要求采样时数据相关性较低，而效率要求执行模式更规则。为此，我们设计了两类方法：通过特定策略约束采样或创建少量不相关数据进行统计估计。前者包括“批量采样（BS）”，即从每个批次中随机选择少量样本，以及“特征采样（FS）”，即从所有样本的每个特征图中随机选择一个小补丁，后者是“虚拟数据集归一化（VDN）”，即生成少量合成随机样本以直接创建不相关数据进行统计估计。相应地，设计了多种策略来降低数据相关性以进行准确估计，同时优化执行模式以加速运行。我们在各种DNN模型上对所提出的方法进行了全面评估，其中模型精度损失和收敛速度可以忽略不计。在没有任何专门库支持的情况下，在现代GPU上实际可实现1.98倍的BN层加速和23.2%的整体训练加速。此外，我们的方法在解决小批量情况下著名的“微BN”问题时表现出强大的性能。本文为高效训练高性能DNN提供了一个有前景的解决方案。

相似文献

Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation.利用少量不相关数据进行统计估计的高效有效批归一化

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):348-362. doi: 10.1109/TNNLS.2020.2978753. Epub 2021 Jan 4.

Training Faster by Separating Modes of Variation in Batch-Normalized Models.通过分离批归一化模型中的变化模式实现更快训练。

IEEE Trans Pattern Anal Mach Intell. 2020 Jun;42(6):1483-1500. doi: 10.1109/TPAMI.2019.2895781. Epub 2019 Jan 28.

L1 -Norm Batch Normalization for Efficient Training of Deep Neural Networks.L1-范数批归一化在深度神经网络高效训练中的应用。

IEEE Trans Neural Netw Learn Syst. 2019 Jul;30(7):2043-2051. doi: 10.1109/TNNLS.2018.2876179. Epub 2018 Nov 9.

Training high-performance and large-scale deep neural networks with full 8-bit integers.用全 8 位整数训练高性能和大规模深度神经网络。

Neural Netw. 2020 May;125:70-82. doi: 10.1016/j.neunet.2019.12.027. Epub 2020 Jan 15.

Re-Thinking the Effectiveness of Batch Normalization and Beyond.重新思考批归一化的有效性及其他

IEEE Trans Pattern Anal Mach Intell. 2024 Jan;46(1):465-478. doi: 10.1109/TPAMI.2023.3319005. Epub 2023 Dec 5.

On the pitfalls of Batch Normalization for end-to-end video learning: A study on surgical workflow analysis.论端到端视频学习中批量归一化的陷阱：外科手术流程分析研究

Med Image Anal. 2024 May;94:103126. doi: 10.1016/j.media.2024.103126. Epub 2024 Mar 1.

Towards accelerating model parallelism in distributed deep learning systems.面向分布式深度学习系统中模型并行性的加速。

PLoS One. 2023 Nov 2;18(11):e0293338. doi: 10.1371/journal.pone.0293338. eCollection 2023.

BNET: Batch Normalization With Enhanced Linear Transformation.BNET：带增强线性变换的批量归一化。

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):9225-9232. doi: 10.1109/TPAMI.2023.3235369. Epub 2023 Jun 5.

Instance Segmentation Based on Improved Self-Adaptive Normalization.基于改进的自适应归一化的实例分割

Sensors (Basel). 2022 Jun 10;22(12):4396. doi: 10.3390/s22124396.

Diminishing Batch Normalization.递减批量归一化

IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6544-6557. doi: 10.1109/TNNLS.2022.3210840. Epub 2024 May 2.

引用本文的文献

High-precision pest and disease detection in greenhouses using the novel IM-AlexNet framework.使用新型IM-AlexNet框架在温室中进行高精度病虫害检测。

NPJ Sci Food. 2025 May 8;9(1):68. doi: 10.1038/s41538-025-00426-7.

Fusion Learning for sEMG Recognition of Multiple Upper-Limb Rehabilitation Movements.基于融合学习的上肢康复运动多类表面肌电识别

Sensors (Basel). 2021 Aug 9;21(16):5385. doi: 10.3390/s21165385.

利用少量不相关数据进行统计估计的高效有效批归一化

Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation.

作者信息

Chen Zhaodong, Deng Lei, Li Guoqi, Sun Jiawei, Hu Xing, Liang Ling, Ding Yufei, Xie Yuan

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):348-362. doi: 10.1109/TNNLS.2020.2978753. Epub 2021 Jan 4.

DOI:10.1109/TNNLS.2020.2978753

PMID:32217486

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用少量不相关数据进行统计估计的高效有效批归一化

Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation.

作者信息

出版信息

相似文献

引用本文的文献

利用少量不相关数据进行统计估计的高效有效批归一化

Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation.

作者信息

出版信息

相似文献

引用本文的文献