一种用于联邦学习中逼真分布模拟的强大采样技术。

A robust sampling technique for realistic distribution simulation in federated learning.

作者信息

Hoepp Robin, Rist Leonhard, Katzmann Alexander, Ashok Raghavan, Wimmer Andreas, Sühling Michael, Maier Andreas

机构信息

Computed Tomography, Siemens Healthineers, Forchheim, Germany.

Pattern Recognition Lab, FAU Erlangen-Nürnberg, Erlangen, Germany.

出版信息

Int J Comput Assist Radiol Surg. 2025 Sep 2. doi: 10.1007/s11548-025-03504-z.

DOI:10.1007/s11548-025-03504-z

PMID:40892192

Abstract

PURPOSE

Federated Learning helps training deep learning networks with diverse data from different locations, particularly in restricted clinical settings. However, label distributions overlapping only partially across clients, due to different demographics, may significantly harm the global training, and thus local model performance. Investigating such effects before rolling out large-scale Federated Learning setups requires proper sampling of the expected label distributions.

METHODS

We present a sampling algorithm to build data subsets according to desired mean and standard deviations from an initial global distribution. To this end, we incorporate the chi-squared and Gini impurity measures to numerically optimize label distributions for multiple groups in an efficient fashion.

RESULTS

Using a real-world application scenario, we sample train and test groups according to region-specific distributions for 3D camera-based weight and height estimation in a clinical context, comparing a hard data split serving as a baseline with our proposed sampling technique. We train a baseline model on all data for comparison and use Federated Averaging to combine the training of our data subsets, demonstrating a realistic deterioration of 25.3 % on weight and 28.7 % on height estimations by the global model.

CONCLUSIONS

Realistically client-biased label distribution can notably harm the training in a federated context. Our sampling algorithm for simulating realistic data distributions opens up an efficient way for prior analysis of this effect. The technique is agnostic to the chosen network architecture and target scenario and can be adapted to any feature or label problem with non-IID subpopulations.

摘要

目的

联邦学习有助于利用来自不同地点的多样化数据训练深度学习网络，特别是在受限的临床环境中。然而，由于不同的人口统计学特征，客户端之间的标签分布仅部分重叠，这可能会严重损害全局训练，进而影响局部模型性能。在大规模推出联邦学习设置之前，研究这种影响需要对预期的标签分布进行适当采样。

方法

我们提出一种采样算法，根据初始全局分布的期望均值和标准差构建数据子集。为此，我们纳入卡方和基尼杂质度量，以高效地对多组标签分布进行数值优化。

结果

在一个实际应用场景中，我们根据特定区域分布对训练组和测试组进行采样，用于临床环境中基于3D摄像头的体重和身高估计，将作为基线的硬数据划分与我们提出的采样技术进行比较。我们在所有数据上训练一个基线模型用于比较，并使用联邦平均法来合并我们数据子集的训练，结果表明全局模型在体重估计上实际下降了25.3%，在身高估计上下降了28.7%。

结论

实际中客户端有偏差的标签分布会在联邦环境中显著损害训练。我们用于模拟实际数据分布的采样算法为事先分析这种影响开辟了一条有效途径。该技术与所选的网络架构和目标场景无关，可适用于任何具有非独立同分布子群体的特征或标签问题。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于联邦学习中逼真分布模拟的强大采样技术。

A robust sampling technique for realistic distribution simulation in federated learning.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献

一种用于联邦学习中逼真分布模拟的强大采样技术。

A robust sampling technique for realistic distribution simulation in federated learning.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献