Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen 518172, China.
Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544.
Proc Natl Acad Sci U S A. 2021 Apr 27;118(17). doi: 10.1073/pnas.2024789118.
Federated learning (FL) enables edge devices, such as Internet of Things devices (e.g., sensors), servers, and institutions (e.g., hospitals), to collaboratively train a machine learning (ML) model without sharing their private data. FL requires devices to exchange their ML parameters iteratively, and thus the time it requires to jointly learn a reliable model depends not only on the number of training steps but also on the ML parameter transmission time per step. In practice, FL parameter transmissions are often carried out by a multitude of participating devices over resource-limited communication networks, for example, wireless networks with limited bandwidth and power. Therefore, the repeated FL parameter transmission from edge devices induces a notable delay, which can be larger than the ML model training time by orders of magnitude. Hence, communication delay constitutes a major bottleneck in FL. Here, a communication-efficient FL framework is proposed to jointly improve the FL convergence time and the training loss. In this framework, a probabilistic device selection scheme is designed such that the devices that can significantly improve the convergence speed and training loss have higher probabilities of being selected for ML model transmission. To further reduce the FL convergence time, a quantization method is proposed to reduce the volume of the model parameters exchanged among devices, and an efficient wireless resource allocation scheme is developed. Simulation results show that the proposed FL framework can improve the identification accuracy and convergence time by up to 3.6% and 87% compared to standard FL.
联邦学习(FL)使物联网设备(如传感器)、服务器和机构(如医院)等边缘设备能够在不共享其私有数据的情况下协同训练机器学习(ML)模型。FL 需要设备迭代地交换他们的 ML 参数,因此共同学习可靠模型所需的时间不仅取决于训练步骤的数量,还取决于每个步骤的 ML 参数传输时间。在实践中,FL 参数传输通常由大量参与设备通过资源有限的通信网络进行,例如带宽和功率有限的无线网络。因此,来自边缘设备的重复 FL 参数传输会引起显著的延迟,其延迟可能比 ML 模型训练时间大几个数量级。因此,通信延迟是 FL 的主要瓶颈。在这里,提出了一种通信高效的 FL 框架,以共同提高 FL 的收敛时间和训练损失。在该框架中,设计了一种概率设备选择方案,使得能够显著提高收敛速度和训练损失的设备被选择用于 ML 模型传输的概率更高。为了进一步减少 FL 的收敛时间,提出了一种量化方法来减少设备之间交换的模型参数的数量,并开发了一种有效的无线资源分配方案。仿真结果表明,与标准 FL 相比,所提出的 FL 框架可以将识别精度和收敛时间提高高达 3.6%和 87%。