Ryffel Théo, Créquit Perrine, Baillet Maëlle, Paumier Jason, Marfoq Yasmine, Girardot Olivier, Chanet Thierry, Sy Ronan, Bayssat Louise, Mazières Julien, Vuiblet Vincent, Ancel Julien, Dewolf Maxime, Margraff François, Bachot Camille, Chmiel Jacek
Arkhn, 9, rue d'Alexandrie, Paris, 75002, France.
Hôpital Foch, Suresnes, France.
JMIR Med Inform. 2025 Jul 31;13:e59685. doi: 10.2196/59685.
Federated analytics in health care allows researchers to perform statistical queries on remote datasets without access to the raw data. This method arose from the need to perform statistical analysis on larger datasets collected at multiple health care centers while avoiding regulatory, governance, and privacy issues that might arise if raw data were collected at a central location outside the health care centers. Despite some pioneering work, federated analytics is still not widely used on real-world data, and to our knowledge, no real-world study has yet combined it with other privacy-enhancing techniques such as differential privacy (DP).
The first objective of this study was to deploy a federated architecture in a real-world setting. The oncology study used for this deployment compared the medical health care management of patients with metastatic non-small cell lung cancer before and after the first wave of COVID-19 pandemic. The second goal was to test DP in this real-world scenario to assess its practicality and use as a privacy-enhancing technology.
A federated architecture platform was set up in the Toulouse, Reims, and Foch centers. After harmonization of the data in each center, statistical analyses were performed using DataSHIELD (Data aggregation through anonymous summary-statistics from harmonized individual-level databases), a federated analysis R library, and a new open-source DP DataSHIELD package was implemented (dsPrivacy).
A total of 50 patients were enrolled in the Toulouse and Reims centers and 49 in the Foch center. We have shown that DataSHIELD is a practical tool to efficiently conduct our study across all 3 centers without exposing data on a central node, once a sufficient setup has been established to configure a secure network between hospitals. All planned aggregated results were successfully generated. We also observed that DP can be implemented in practice with promising trade-offs between privacy and accuracy, and we built a library that will prove useful for future work.
The federated architecture platform made it possible to run a multicenter study on real-world oncology data while ensuring strong privacy guarantees using differential privacy.
医疗保健领域的联合分析使研究人员能够对远程数据集进行统计查询,而无需访问原始数据。这种方法源于对在多个医疗保健中心收集的更大数据集进行统计分析的需求,同时避免了如果在医疗保健中心以外的中央位置收集原始数据可能出现的监管、治理和隐私问题。尽管有一些开创性的工作,但联合分析在现实世界数据中的应用仍不广泛,据我们所知,尚无实际研究将其与其他隐私增强技术(如差分隐私(DP))相结合。
本研究的首要目标是在现实环境中部署联合架构。用于此次部署的肿瘤学研究比较了第一波新冠疫情前后转移性非小细胞肺癌患者的医疗保健管理情况。第二个目标是在这个现实场景中测试差分隐私,以评估其作为一种隐私增强技术的实用性和用途。
在图卢兹、兰斯和福煦中心建立了一个联合架构平台。在每个中心的数据进行协调后,使用DataSHIELD(通过来自协调后的个体层面数据库的匿名汇总统计进行数据聚合)这一联合分析R库进行统计分析,并实施了一个新的开源差分隐私DataSHIELD包(dsPrivacy)。
图卢兹和兰斯中心共纳入了50名患者,福煦中心纳入了49名患者。我们已经表明,一旦建立了足够的设置以配置医院之间的安全网络,DataSHIELD就是一个实用工具,能够在所有3个中心高效开展我们的研究,而无需在中央节点暴露数据。所有计划的汇总结果均成功生成。我们还观察到,差分隐私在实践中可以实现,在隐私和准确性之间有不错的权衡,并且我们构建了一个库,这将对未来的工作很有用。
联合架构平台使得在确保使用差分隐私提供强大隐私保障的同时,能够对现实世界的肿瘤学数据进行多中心研究。