Univ Rennes, CHU Rennes, INSERM, LTSI-UMR 1099, F-35000, Rennes, France.
IMT Atlantique, INSERM, LATIM-UMR 1101, F-29200, Brest, France.
Stud Health Technol Inform. 2024 Aug 22;316:611-615. doi: 10.3233/SHTI240488.
Secure extraction of Personally Identifiable Information (PII) from Electronic Health Records (EHRs) presents significant privacy and security challenges. This study explores the application of Federated Learning (FL) to overcome these challenges within the context of French EHRs. By utilizing a multilingual BERT model in an FL simulation involving 20 hospitals, each represented by a unique medical department or pole, we compared the performance of two setups: individual models, where each hospital uses only its own training and validation data without engaging in the FL process, and federated models, where multiple hospitals collaborate to train a global FL model. Our findings demonstrate that FL models not only preserve data confidentiality but also outperform the individual models. In fact, the Global FL model achieved an F1 score of 75,7%, slightly comparable to that of the Centralized approach at 78,5%. This research underscores the potential of FL in extracting PIIs from EHRs, encouraging its broader adoption in health data analysis.
从电子健康记录 (EHR) 中安全提取个人身份信息 (PII) 存在重大的隐私和安全挑战。本研究探讨了联邦学习 (FL) 在法国 EHR 环境中的应用,以克服这些挑战。通过在涉及 20 家医院的 FL 模拟中使用多语言 BERT 模型,每个医院由一个独特的医疗部门或科室代表,我们比较了两种设置的性能:个体模型,其中每家医院仅使用自己的训练和验证数据,而不参与 FL 过程,以及联邦模型,其中多个医院合作训练一个全局 FL 模型。我们的研究结果表明,FL 模型不仅保护了数据机密性,而且表现优于个体模型。事实上,全局 FL 模型的 F1 评分达到了 75.7%,与集中式方法的 78.5%相当。这项研究强调了 FL 在从 EHR 中提取 PII 的潜力,鼓励其在医疗数据分析中更广泛地采用。