European Clinical Research Infrastructure Network (ECRIN), Boulevard Saint Jacques 30, 75014, Paris, France.
Foundation Lygature, Jaarbeursplein 6, 3521 AL, Utrecht, The Netherlands.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae262.
Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator's premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts.
生物医学数据是从各种来源生成和收集的,包括医学成像、实验室测试和基因组测序。共享这些数据用于研究可以帮助解决未满足的健康需求,有助于科学突破,加速开发更有效的治疗方法,并为公共卫生政策提供信息。然而,由于这些数据具有潜在的敏感性,隐私问题导致了限制数据共享的政策。此外,共享敏感数据需要具有适当存储解决方案的安全且强大的基础设施。在这里,我们通过在欧洲数据共享领域具有战略意义的五个大规模和真实用例的棱镜来研究和比较集中式和联邦式数据共享模型:法国健康数据中心、BBMRI-ERIC 结直肠癌队列、联邦欧洲基因组表型档案、观察性医学结局伙伴关系/OHDSI 网络和 EBRAINS 医学信息学平台。我们的分析表明,集中式模型有助于数据的链接、协调和互操作性,而联邦式模型则有助于扩大规模和符合法律规定,因为数据通常驻留在数据生成者的场所,从而可以更好地控制数据的共享方式。因此,这项比较研究为敏感数据集选择最合适的共享策略提供了指导,并为数据共享工作中的明智决策提供了关键见解。