Xiao Peng, Cheng Samuel, Stankovic Vladimir, Vukobratovic Dejan
Department of Computer Science and Technology, Tongji University, Shanghai 201804, China.
The School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK 73019, USA.
Entropy (Basel). 2020 Mar 11;22(3):314. doi: 10.3390/e22030314.
Federated learning is a decentralized topology of deep learning, that trains a shared model through data distributed among each client (like mobile phones, wearable devices), in order to ensure data privacy by avoiding raw data exposed in data center (server). After each client computes a new model parameter by stochastic gradient descent (SGD) based on their own local data, these locally-computed parameters will be aggregated to generate an updated global model. Many current state-of-the-art studies aggregate different client-computed parameters by averaging them, but none theoretically explains why averaging parameters is a good approach. In this paper, we treat each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.
联邦学习是深度学习的一种去中心化拓扑结构,它通过分布在每个客户端(如手机、可穿戴设备)的数据来训练一个共享模型,以避免原始数据暴露在数据中心(服务器)从而确保数据隐私。每个客户端基于自身的本地数据通过随机梯度下降(SGD)计算出一个新的模型参数后,这些本地计算的参数将被聚合以生成一个更新的全局模型。当前许多前沿研究通过对不同客户端计算的参数求平均来进行聚合,但从理论上没有解释为什么平均参数是一种好方法。在本文中,由于随机梯度下降的随机性,我们将每个客户端计算的参数视为一个随机向量,并在两个学习任务中使用两种方法估计不同训练阶段两个客户端计算的参数之间的互信息。结果证实了不同客户端之间的相关性,并显示出互信息随训练迭代呈增加趋势。然而,当我们进一步计算客户端计算的参数之间的距离时,我们发现参数之间的相关性越来越高,但并没有变得更接近。这种现象表明平均参数可能不是聚合训练参数的最佳方式。