School of Information and Communication Engineering, Chungbuk National University, Chungju 28644, Republic of Korea.
Sensors (Basel). 2023 Jan 19;23(3):1152. doi: 10.3390/s23031152.
Due to the distributed data collection and learning in federated learnings, many clients conduct local training with non-independent and identically distributed (non-IID) datasets. Accordingly, the training from these datasets results in severe performance degradation. We propose an efficient algorithm for enhancing the performance of federated learning by overcoming the negative effects of non-IID datasets. First, the intra-client class imbalance is reduced by rendering the class distribution of clients close to Uniform distribution. Second, the clients to participate in federated learning are selected to make their integrated class distribution close to Uniform distribution for the purpose of mitigating the inter-client class imbalance, which represents the class distribution difference among clients. In addition, the amount of local training data for the selected clients is finely adjusted. Finally, in order to increase the efficiency of federated learning, the batch size and the learning rate of local training for the selected clients are dynamically controlled reflecting the effective size of the local dataset for each client. In the performance evaluation on CIFAR-10 and MNIST datasets, the proposed algorithm achieves 20% higher accuracy than existing federated learning algorithms. Moreover, in achieving this huge accuracy improvement, the proposed algorithm uses less computation and communication resources compared to existing algorithms in terms of the amount of data used and the number of clients joined in the training.
由于联邦学习中的分布式数据收集和学习,许多客户端使用非独立同分布(non-IID)数据集进行本地训练。因此,这些数据集的训练会导致性能严重下降。我们提出了一种有效的算法,通过克服非 IID 数据集的负面影响来提高联邦学习的性能。首先,通过使客户端的类分布接近均匀分布来减少客户端内的类不平衡。其次,选择要参与联邦学习的客户端,以使它们的综合类分布接近均匀分布,以减轻客户端间的类不平衡,这代表了客户端之间的类分布差异。此外,还对选定客户端的本地训练数据量进行了精细调整。最后,为了提高联邦学习的效率,根据每个客户端的本地数据集的有效大小,动态控制选定客户端的本地训练的批量大小和学习率。在 CIFAR-10 和 MNIST 数据集上的性能评估中,所提出的算法比现有的联邦学习算法的准确性高 20%。此外,在所提出的算法中,与现有的算法相比,在使用的数据量和参与训练的客户端数量方面,所提出的算法使用的计算和通信资源更少,从而实现了这一巨大的准确性提高。