Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy.
Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy.
Eur J Nucl Med Mol Imaging. 2021 Nov;48(12):3791-3804. doi: 10.1007/s00259-021-05339-7. Epub 2021 Apr 13.
The present scoping review aims to assess the non-inferiority of distributed learning over centrally and locally trained machine learning (ML) models in medical applications.
We performed a literature search using the term "distributed learning" OR "federated learning" in the PubMed/MEDLINE and EMBASE databases. No start date limit was used, and the search was extended until July 21, 2020. We excluded articles outside the field of interest; guidelines or expert opinion, review articles and meta-analyses, editorials, letters or commentaries, and conference abstracts; articles not in the English language; and studies not using medical data. Selected studies were classified and analysed according to their aim(s).
We included 26 papers aimed at predicting one or more outcomes: namely risk, diagnosis, prognosis, and treatment side effect/adverse drug reaction. Distributed learning was compared to centralized or localized training in 21/26 and 14/26 selected papers, respectively. Regardless of the aim, the type of input, the method, and the classifier, distributed learning performed close to centralized training, but two experiments focused on diagnosis. In all but 2 cases, distributed learning outperformed locally trained models.
Distributed learning resulted in a reliable strategy for model development; indeed, it performed equally to models trained on centralized datasets. Sensitive data can get preserved since they are not shared for model development. Distributed learning constitutes a promising solution for ML-based research and practice since large, diverse datasets are crucial for success.
本范围综述旨在评估分布式学习在医学应用中与集中式和本地训练的机器学习(ML)模型相比的非劣效性。
我们在 PubMed/MEDLINE 和 EMBASE 数据库中使用“分布式学习”或“联邦学习”一词进行文献检索。没有使用起始日期限制,搜索范围扩展到 2020 年 7 月 21 日。我们排除了不在研究领域内的文章;指南或专家意见、综述文章和荟萃分析、社论、信件或评论以及会议摘要;非英语文章;以及不使用医学数据的研究。根据其目的对选定的研究进行分类和分析。
我们纳入了 26 篇旨在预测一个或多个结果的论文:即风险、诊断、预后和治疗副作用/药物不良反应。分布式学习在 21/26 和 14/26 篇选定论文中分别与集中式或本地训练进行了比较。无论目的、输入类型、方法和分类器如何,分布式学习都接近集中式训练,但有两个实验专注于诊断。除了 2 个案例外,分布式学习都优于本地训练的模型。
分布式学习为模型开发提供了可靠的策略;实际上,它的表现与集中式数据集上训练的模型相当。由于大型、多样化的数据集对于成功至关重要,因此可以保留敏感数据而不将其用于模型开发。分布式学习是基于机器学习的研究和实践的有前途的解决方案,因为它需要大量的多样化数据。