Tahir Nurfaidah, Jung Chau-Ren, Lee Shin-Da, Azizah Nur, Ho Wen-Chao, Li Tsai-Chung
Department of Public Health, College of Public Health, China Medical University, No. 100, Section 1, Jingmao Road, Beitun District, Taichung, 406040, Taiwan, 886 422053366 ext 6117.
Department of Industrial Engineering, Hasanuddin University, Makassar, Indonesia.
J Med Internet Res. 2025 Jul 21;27:e65708. doi: 10.2196/65708.
The rise of federated learning (FL) as a novel privacy-preserving technology offers the potential to create models collaboratively in a decentralized manner to address confidentiality issues, particularly regarding data privacy. However, there is a scarcity of clear and comprehensive evidence that compares the performance of FL with that of the established centralized machine learning (CML) in the clinical domain.
This study aimed to review the performance comparisons of FL-based and CML models for mortality prediction in clinical settings.
Experimental studies comparing the performance of FL and CML in predicting mortality were selected. Articles were excluded if they did not compare FL with CML or only compared the effectiveness of different FL baseline models. Two independent reviewers performed the screening, data extraction, and risk of bias assessment. The IEEE Xplore, PubMed, ScienceDirect, and Web of Science databases were searched for articles published up to June 2024. The risk of bias was assessed using CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies) and PROBAST (Prediction Model Risk of Bias Assessment Tool). Meta-analyses of the pooled area under the receiver operating curve (AUROC)/area under the curve (AUC) were performed for within-group comparisons (before and after federation).
Nine articles with heterogeneous framework design, scenario, and clinical context were included: 4 articles focused on specific case types; 3 articles were conducted in intensive care unit settings; and 2 articles in emergency departments, urgent centers, or trauma centers. Cohort datasets involving 1,412,973 participants were used in all of the included studies. These studies universally indicated that the predictive performance of FL models is comparable to that of CML. The pooled AUC for the FL and CML performances were 0.81 (95% CI 0.76-0.85; I2=78.36%) and 0.82 (95% CI 0.77-0.86; I2=72.33%), respectively. The Higgins I2 test indicated high heterogeneity between the included studies (I2≥50%). In total, 4 out of 9 (44%) of the developed models were identified as having a high risk of bias.
This systematic review and meta-analysis demonstrate that FL can achieve similar performance to CML while conquering privacy risks in predicting mortality across various settings. Owing to the small number of studies and a moderate proportion of the high risk of bias, the effect estimates might be imprecise.
联邦学习(FL)作为一种新型的隐私保护技术正在兴起,它有可能以分散的方式协作创建模型,以解决保密性问题,特别是关于数据隐私的问题。然而,缺乏清晰且全面的证据来比较联邦学习与临床领域中已确立的集中式机器学习(CML)的性能。
本研究旨在回顾基于联邦学习和集中式机器学习模型在临床环境中进行死亡率预测的性能比较。
选择比较联邦学习和集中式机器学习在预测死亡率方面性能的实验研究。如果文章没有比较联邦学习和集中式机器学习,或者仅比较了不同联邦学习基线模型的有效性,则将其排除。两名独立的评审员进行筛选、数据提取和偏倚风险评估。在IEEE Xplore、PubMed、ScienceDirect和Web of Science数据库中搜索截至2024年6月发表的文章。使用CHARM(预测建模研究系统评价的关键评估和数据提取清单)和PROBAST(预测模型偏倚风险评估工具)评估偏倚风险。对组内比较(联合前后)进行受试者工作特征曲线下面积(AUROC)/曲线下面积(AUC)汇总的荟萃分析。
纳入了9篇框架设计、场景和临床背景各异的文章:4篇文章关注特定病例类型;3篇文章在重症监护病房环境中进行;2篇文章在急诊科、急救中心或创伤中心进行。所有纳入研究均使用了涉及1412973名参与者的队列数据集。这些研究普遍表明,联邦学习模型的预测性能与集中式机器学习相当。联邦学习和集中式机器学习性能的汇总AUC分别为0.81(95%CI 0.76 - 0.85;I2 = 78.36%)和0.82(95%CI 0.77 - 0.86;I2 = 72.33%)。Higgins I2检验表明纳入研究之间存在高度异质性(I2≥50%)。在总共开发的9个模型中,有4个(44%)被确定存在高偏倚风险。
本系统评价和荟萃分析表明,在跨越各种环境预测死亡率时,联邦学习在克服隐私风险的同时能够实现与集中式机器学习相似的性能。由于研究数量较少且高偏倚风险比例适中,效应估计可能不准确。