Li Siqi, Miao Di, Wu Qiming, Hong Chuan, D'Agostino Danny, Li Xin, Ning Yilin, Shang Yuqing, Wang Ziwen, Liu Molei, Fu Huazhu, Ong Marcus Eng Hock, Haddadi Hamed, Liu Nan
Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore.
Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
Health Data Sci. 2024 Dec 4;4:0196. doi: 10.34133/hds.0196. eCollection 2024.
Federated learning (FL) holds promise for safeguarding data privacy in healthcare collaborations. While the term "FL" was originally coined by the engineering community, the statistical field has also developed privacy-preserving algorithms, though these are less recognized. Our goal was to bridge this gap with the first comprehensive comparison of FL frameworks from both domains. We assessed 7 FL frameworks, encompassing both engineering-based and statistical FL algorithms, and compared them against local and centralized modeling of logistic regression and least absolute shrinkage and selection operator (Lasso). Our evaluation utilized both simulated data and real-world emergency department data, focusing on comparing both estimated model coefficients and the performance of model predictions. The findings reveal that statistical FL algorithms produce much less biased estimates of model coefficients. Conversely, engineering-based methods can yield models with slightly better prediction performance, occasionally outperforming both centralized and statistical FL models. This study underscores the relative strengths and weaknesses of both types of methods, providing recommendations for their selection based on distinct study characteristics. Furthermore, we emphasize the critical need to raise awareness of and integrate these methods into future applications of FL within the healthcare domain.
联邦学习(FL)有望在医疗合作中保护数据隐私。虽然“FL”一词最初是由工程界创造的,但统计领域也开发了隐私保护算法,不过这些算法的认可度较低。我们的目标是通过对这两个领域的FL框架进行首次全面比较来弥合这一差距。我们评估了7个FL框架,包括基于工程的和统计的FL算法,并将它们与逻辑回归以及最小绝对收缩和选择算子(Lasso)的局部和集中式建模进行比较。我们的评估使用了模拟数据和真实世界的急诊科数据,重点比较了估计的模型系数和模型预测的性能。研究结果表明,统计FL算法对模型系数的估计偏差要小得多。相反,基于工程的方法可以产生预测性能略好的模型,偶尔会优于集中式和统计FL模型。这项研究强调了这两种方法的相对优势和劣势,根据不同的研究特征为它们的选择提供了建议。此外,我们强调迫切需要提高对这些方法的认识,并将它们纳入医疗领域未来的FL应用中。