Suppr超能文献

用于分布式深度学习的学习梯度压缩

Learned Gradient Compression for Distributed Deep Learning.

作者信息

Abrahamyan Lusine, Chen Yiming, Bekoulis Giannis, Deligiannis Nikos

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7330-7344. doi: 10.1109/TNNLS.2021.3084806. Epub 2022 Nov 30.

Abstract

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several computational nodes that have access to different chunks of the data. This approach, however, entails high communication rates and latency because of the computed gradients that need to be shared among nodes at every iteration. The problem becomes more pronounced in the case that there is wireless communication between the nodes (i.e., due to the limited network bandwidth). To address this problem, various compression methods have been proposed, including sparsification, quantization, and entropy encoding of the gradients. Existing methods leverage the intra-node information redundancy, that is, they compress gradients at each node independently. In contrast, we advocate that the gradients across the nodes are correlated and propose methods to leverage this inter-node redundancy to improve compression efficiency. Depending on the node communication protocol (parameter server or ring-allreduce), we propose two instances for the gradient compression that we coin Learned Gradient Compression (LGC). Our methods exploit an autoencoder (i.e., trained during the first stages of the distributed training) to capture the common information that exists in the gradients of the distributed nodes. To constrain the nodes' computational complexity, the autoencoder is realized with a lightweight neural network. We have tested our LGC methods on the image classification and semantic segmentation tasks using different convolutional neural networks (CNNs) [ResNet50, ResNet101, and pyramid scene parsing network (PSPNet)] and multiple datasets (ImageNet, Cifar10, and CamVid). The ResNet101 model trained for image classification on Cifar10 achieved significant compression rate reductions with the accuracy of 93.57%, which is lower than the baseline distributed training with uncompressed gradients only by 0.18%. The rate of the model is reduced by 8095× and 8× compared with the baseline and the state-of-the-art deep gradient compression (DGC) method, respectively.

摘要

在包含高维数据的大型数据集上训练深度神经网络需要大量计算。解决此问题的一种方法是数据并行分布式训练,即将模型复制到几个可访问不同数据块的计算节点中。然而,由于每次迭代时需要在节点之间共享计算出的梯度,这种方法会带来高通信速率和延迟。在节点之间存在无线通信的情况下(即由于网络带宽有限),问题会变得更加突出。为了解决这个问题,已经提出了各种压缩方法,包括梯度的稀疏化、量化和熵编码。现有方法利用节点内信息冗余,即它们在每个节点上独立压缩梯度。相比之下,我们主张节点间的梯度是相关的,并提出利用这种节点间冗余来提高压缩效率的方法。根据节点通信协议(参数服务器或环形全规约),我们提出了两种梯度压缩实例,我们将其称为学习梯度压缩(LGC)。我们的方法利用自动编码器(即在分布式训练的第一阶段进行训练)来捕获分布式节点梯度中存在的共同信息。为了限制节点的计算复杂度,自动编码器由一个轻量级神经网络实现。我们使用不同的卷积神经网络(CNN)[ResNet50、ResNet101和金字塔场景解析网络(PSPNet)]和多个数据集(ImageNet、Cifar10和CamVid)在图像分类和语义分割任务上测试了我们的LGC方法。在Cifar10上训练用于图像分类的ResNet101模型实现了显著的压缩率降低,准确率为93.57%,仅比仅使用未压缩梯度的基线分布式训练低0.18%。与基线和当前最先进的深度梯度压缩(DGC)方法相比,该模型的速率分别降低了8095倍和8倍。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验