Tang Guojun, Black Jason E, Williamson Tyler S, Drew Steve H
Department of Electrical and Software Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB.
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB.
AMIA Annu Symp Proc. 2025 May 22;2024:1099-1108. eCollection 2024.
Integrating Electronic Health Records (EHR) and the application of machine learning present opportunities for enhancing the accuracy and accessibility of data-driven diabetes prediction. In particular, developing data-driven machine learning models can provide early identification of patients with high risk for diabetes, potentially leading to more effective therapeutic strategies and reduced healthcare costs. However, regulation restrictions create barriers to developing centralized predictive models. This paper addresses the challenges by introducing a federated learning approach, which amalgamates predictive models without centralized data storage and processing, thus avoiding privacy issues. This marks the first application of federated learning to predict diabetes using real clinical datasets in Canada extracted from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) without cross-province patient data sharing. We address class-imbalance issues through downsampling techniques and compare federated learning performance against province-based and centralized models. Experimental results show that the federated MLP model presents a similar or higher performance compared to the model trained with the centralized approach. However, the federated logistic regression model showed inferior performance compared to its centralized peer.
整合电子健康记录(EHR)与机器学习的应用为提高数据驱动的糖尿病预测的准确性和可及性带来了机遇。特别是,开发数据驱动的机器学习模型可以早期识别糖尿病高危患者,这可能会带来更有效的治疗策略并降低医疗成本。然而,监管限制为开发集中式预测模型设置了障碍。本文通过引入联邦学习方法来应对这些挑战,该方法在不进行集中式数据存储和处理的情况下合并预测模型,从而避免隐私问题。这标志着联邦学习首次应用于使用从加拿大初级保健哨点监测网络(CPCSSN)提取的真实临床数据集预测糖尿病,且不进行跨省份患者数据共享。我们通过下采样技术解决类别不平衡问题,并将联邦学习性能与基于省份的模型和集中式模型进行比较。实验结果表明,与采用集中式方法训练的模型相比,联邦多层感知器(MLP)模型表现出相似或更高的性能。然而,联邦逻辑回归模型与其集中式对等模型相比表现较差。