Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany.
Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.
J Med Internet Res. 2023 Mar 30;25:e41588. doi: 10.2196/41588.
The collection, storage, and analysis of large data sets are relevant in many sectors. Especially in the medical field, the processing of patient data promises great progress in personalized health care. However, it is strictly regulated, such as by the General Data Protection Regulation (GDPR). These regulations mandate strict data security and data protection and, thus, create major challenges for collecting and using large data sets. Technologies such as federated learning (FL), especially paired with differential privacy (DP) and secure multiparty computation (SMPC), aim to solve these challenges.
This scoping review aimed to summarize the current discussion on the legal questions and concerns related to FL systems in medical research. We were particularly interested in whether and to what extent FL applications and training processes are compliant with the GDPR data protection law and whether the use of the aforementioned privacy-enhancing technologies (DP and SMPC) affects this legal compliance. We placed special emphasis on the consequences for medical research and development.
We performed a scoping review according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). We reviewed articles on Beck-Online, SSRN, ScienceDirect, arXiv, and Google Scholar published in German or English between 2016 and 2022. We examined 4 questions: whether local and global models are "personal data" as per the GDPR; what the "roles" as defined by the GDPR of various parties in FL are; who controls the data at various stages of the training process; and how, if at all, the use of privacy-enhancing technologies affects these findings.
We identified and summarized the findings of 56 relevant publications on FL. Local and likely also global models constitute personal data according to the GDPR. FL strengthens data protection but is still vulnerable to a number of attacks and the possibility of data leakage. These concerns can be successfully addressed through the privacy-enhancing technologies SMPC and DP.
Combining FL with SMPC and DP is necessary to fulfill the legal data protection requirements (GDPR) in medical research dealing with personal data. Even though some technical and legal challenges remain, for example, the possibility of successful attacks on the system, combining FL with SMPC and DP creates enough security to satisfy the legal requirements of the GDPR. This combination thereby provides an attractive technical solution for health institutions willing to collaborate without exposing their data to risk. From a legal perspective, the combination provides enough built-in security measures to satisfy data protection requirements, and from a technical perspective, the combination provides secure systems with comparable performance with centralized machine learning applications.
在许多领域,大规模数据集的收集、存储和分析都很重要。特别是在医疗领域,处理患者数据有望在个性化医疗保健方面取得重大进展。然而,这受到严格的监管,例如《通用数据保护条例》(GDPR)的监管。这些法规要求严格的数据安全和数据保护,因此给大规模数据集的收集和使用带来了重大挑战。联邦学习(FL)等技术,特别是与差分隐私(DP)和安全多方计算(SMPC)相结合,旨在解决这些挑战。
本范围综述旨在总结当前关于医疗研究中与 FL 系统相关的法律问题和关注点的讨论。我们特别关注 FL 应用程序和培训过程是否以及在何种程度上符合 GDPR 数据保护法,以及使用上述隐私增强技术(DP 和 SMPC)是否会影响这种法律合规性。我们特别强调对医疗研究和开发的影响。
我们根据 PRISMA-ScR(系统评价和荟萃分析扩展的首选报告项目,用于范围综述)进行了范围综述。我们在 Beck-Online、SSRNet、ScienceDirect、arXiv 和 Google Scholar 上检索了 2016 年至 2022 年间以德语或英语发表的文章。我们研究了以下 4 个问题:根据 GDPR,本地和全局模型是否属于“个人数据”;FL 中各方的“角色”定义是什么;在培训过程的各个阶段,谁控制数据;以及隐私增强技术的使用如果有影响的话,会产生什么影响。
我们确定并总结了 56 篇关于 FL 的相关出版物的研究结果。根据 GDPR,本地和可能还有全局模型构成个人数据。FL 增强了数据保护,但仍容易受到多种攻击和数据泄露的可能性的影响。通过隐私增强技术 SMPC 和 DP,可以成功解决这些问题。
将 FL 与 SMPC 和 DP 相结合,对于满足涉及个人数据的医疗研究的法律数据保护要求(GDPR)是必要的。尽管仍然存在一些技术和法律挑战,例如对系统的成功攻击的可能性,但将 FL 与 SMPC 和 DP 相结合可以提供足够的安全性来满足 GDPR 的法律要求。这种结合为愿意合作而又不将数据置于风险之中的医疗机构提供了一个有吸引力的技术解决方案。从法律角度来看,这种结合提供了足够的内置安全措施来满足数据保护要求,从技术角度来看,这种结合提供了具有可比性能的安全系统与集中式机器学习应用程序。