Suppr超能文献

用于通过分散数据增强剂量体积参数预测的联邦学习。

Federated learning for enhanced dose-volume parameter prediction with decentralized data.

作者信息

Zhang Jiahan, Lei Yang, Xia Junyi, Chao Ming, Liu Tian

机构信息

Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

出版信息

Med Phys. 2025 Mar;52(3):1408-1415. doi: 10.1002/mp.17566. Epub 2024 Dec 6.

Abstract

BACKGROUND

The widespread adoption of knowledge-based planning in radiation oncology clinics is hindered by the lack of data and the difficulty associated with sharing medical data.

PURPOSE

This study aims to assess the feasibility of mitigating this challenge through federated learning (FL): a centralized model trained with distributed datasets, while keeping data localized and private.

METHODS

This concept was tested using 273 prostate 45 Gy plans. The cases were split into a training set with 220 cases and a validation set with 53 cases. The training set was further separated into 10 subsets to simulate treatment plans from different clinics. A gradient-boosting model was used to predict bladder and rectum V, V, and V. The Federated Averaging algorithm was employed to aggregate the individual model weights from distributed datasets. Grid search with five-fold in-training-set cross-validation was implemented to tune model hyperparameters. Additionally, we evaluated the robustness of the FL approach by varying the distribution of the training set data in several scenarios, including different number of sites and imbalanced data across sites.

RESULTS

The mean absolute error (MAE) for the FL model (4.7% ± 2.9%) is significantly lower than individual models trained separately (6.5% ± 4.9%, p < 0.001) and similar to a traditional centralized model (4.4% ± 2.8%, p = 0.14). The federated model is robust to the number of subsets, showing MAE of 4.7% ± 3.2%, 4.8% ± 3.1%, 4.8% ± 2.9%, 4.5% ± 2.8%, 4.9% ± 3.3%, and 4.8% ± 3.1% for 5, 10, 15, 20, 25, and 30 subsets, respectively. For the two imbalanced datasets, the FL model achieves MAEs of 4.5% ± 2.9% and 5.6% ± 4.0%, non-inferior to the balanced data model. For all bladder and rectum metrics, the FL model significantly outperforms 36.7% of individual models.

CONCLUSIONS

This study demonstrates the potential advantages of implementing a federated model over training individual models: the proposed FL approach achieves similar prediction accuracy as a conventional model without requiring centralized data storage. Even when local models struggle to produce accurate predictions due to data scarcity, the federated model consistently maintains high performance.

摘要

背景

放射肿瘤学诊所中基于知识的计划的广泛采用受到数据缺乏以及与医疗数据共享相关的困难的阻碍。

目的

本研究旨在评估通过联邦学习(FL)缓解这一挑战的可行性:一种使用分布式数据集训练的集中式模型,同时保持数据本地化和隐私性。

方法

使用273个前列腺45Gy计划对这一概念进行测试。这些病例被分为一个包含220个病例的训练集和一个包含53个病例的验证集。训练集进一步分为10个子集,以模拟来自不同诊所的治疗计划。使用梯度提升模型预测膀胱和直肠的V、V和V。采用联邦平均算法聚合来自分布式数据集的各个模型权重。通过在训练集内进行五折交叉验证的网格搜索来调整模型超参数。此外,我们通过在几种场景中改变训练集数据的分布来评估FL方法的稳健性,包括不同数量的站点以及各站点间数据不均衡的情况。

结果

FL模型的平均绝对误差(MAE)为(4.7%±2.9%),显著低于单独训练的个体模型(6.5%±4.9%,p<0.001),且与传统集中式模型相似(4.4%±2.8%,p = 0.14)。联邦模型对子集数量具有稳健性,对于5、10、15、20、25和30个子集,MAE分别为4.7%±3.2%、4.8%±3.1%、4.8%±2.9%、4.5%±2.8%、4.9%±3.3%和4.8%±3.1%。对于两个不均衡数据集,FL模型的MAE分别为4.5%±2.9%和5.6%±4.0%,不劣于平衡数据模型。对于所有膀胱和直肠指标,FL模型显著优于36.7%的个体模型。

结论

本研究证明了实施联邦模型相对于训练个体模型的潜在优势:所提出的FL方法在无需集中数据存储的情况下实现了与传统模型相似的预测准确性。即使由于数据稀缺,局部模型难以产生准确预测时,联邦模型仍能始终保持高性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验