面向 AIoT 分布式训练的基于分区的梯度压缩算法。

A Partition Based Gradient Compression Algorithm for Distributed Training in AIoT.

机构信息

Department of Computer Science and Technology, North China University of Science and Technology, Tangshan 063210, China.

出版信息

Sensors (Basel). 2021 Mar 10;21(6):1943. doi: 10.3390/s21061943.

DOI:10.3390/s21061943

PMID:33801972

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7999038/

Abstract

Running Deep Neural Networks (DNNs) in distributed Internet of Things (IoT) nodes is a promising scheme to enhance the performance of IoT systems. However, due to the limited computing and communication resources of the IoT nodes, the communication efficiency of the distributed DNN training strategy is a problem demanding a prompt solution. In this paper, an adaptive compression strategy based on gradient partition is proposed to solve the problem of high communication overhead between nodes during the distributed training procedure. Firstly, a neural network is trained to predict the gradient distribution of its parameters. According to the distribution characteristics of the gradient, the gradient is divided into the key region and the sparse region. At the same time, combined with the information entropy of gradient distribution, a reasonable threshold is selected to filter the gradient value in the partition, and only the gradient value greater than the threshold is transmitted and updated, to reduce the traffic and improve the distributed training efficiency. The strategy uses gradient sparsity to achieve the maximum compression ratio of 37.1 times, which improves the training efficiency to a certain extent.

摘要

在分布式物联网 (IoT) 节点中运行深度神经网络 (DNN) 是提高 IoT 系统性能的一种很有前途的方案。然而，由于 IoT 节点的计算和通信资源有限，分布式 DNN 训练策略的通信效率是一个亟待解决的问题。在本文中，提出了一种基于梯度分区的自适应压缩策略，以解决分布式训练过程中节点间通信开销过高的问题。首先，训练一个神经网络来预测其参数的梯度分布。根据梯度的分布特征，将梯度分为关键区域和稀疏区域。同时，结合梯度分布的信息熵，选择合理的阈值对梯度值进行分区过滤，仅传输和更新大于阈值的梯度值，以减少流量并提高分布式训练效率。该策略利用梯度稀疏性实现了 37.1 倍的最大压缩比，在一定程度上提高了训练效率。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

面向 AIoT 分布式训练的基于分区的梯度压缩算法。

A Partition Based Gradient Compression Algorithm for Distributed Training in AIoT.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

面向 AIoT 分布式训练的基于分区的梯度压缩算法。

A Partition Based Gradient Compression Algorithm for Distributed Training in AIoT.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献