Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, USA.
Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Nat Commun. 2022 Dec 5;13(1):7346. doi: 10.1038/s41467-022-33407-5.
Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing.
尽管机器学习(ML)在各个学科中都显示出了前景,但样本外泛化能力令人担忧。目前通过共享多站点数据来解决这一问题,但由于各种限制,这种集中化在规模上具有挑战性/不可行。联邦学习(FL)通过仅共享数值模型更新,为准确和可泛化的 ML 提供了另一种范例。在这里,我们展示了迄今为止最大的 FL 研究,涉及来自六大洲 71 个站点的数据,为罕见病胶质母细胞瘤生成自动肿瘤边界探测器,报告了文献中最大的此类数据集(n=6314)。我们证明了对于可手术靶向肿瘤,与公开训练的模型相比,可提高 33%的勾画精度,对于完整肿瘤范围,可提高 23%的勾画精度。我们预计我们的研究将:1)能够通过大数据进行更多的医疗保健研究,确保罕见病和代表性不足的人群获得有意义的结果;2)通过发布我们的共识模型,促进对胶质母细胞瘤的进一步分析;3)展示在这种规模和任务复杂性下的 FL 有效性,作为多站点合作的范式转变,减轻对数据共享的需求。