Carvalho Tânia, Antunes Luís, Costa Santos Cristina, Moniz Nuno
Departamento de Ciência de Computadores, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, s/n, 4169-007, Porto, Portugal.
TekPrivacy, Lda, R. Alfredo Allen 455 461, 4200-135, Porto, Portugal.
Sci Data. 2025 Feb 12;12(1):248. doi: 10.1038/s41597-025-04506-x.
The Covid-19 pandemic has affected the world at multiple levels. Data sharing was pivotal for advancing research to understand the underlying causes and implement effective containment strategies. In response, many countries have facilitated access to daily cases to support research initiatives, fostering collaboration between organisations and making such data available to the public through open data platforms. Despite the several advantages of data sharing, one of the major concerns before releasing health data is its impact on individuals' privacy. Such a sharing process should adhere to state-of-the-art methods in Data Protection by Design and by Default. In this paper, we use a Covid-19 data set from Portugal's second-largest hospital to show how it is feasible to ensure data privacy while improving the quality and maintaining the utility of the data. Our goal is to demonstrate how knowledge exchange in multidisciplinary teams of healthcare practitioners, data privacy, and data science experts is crucial to co-developing strategies that ensure high utility in de-identified data.
新冠疫情在多个层面影响了世界。数据共享对于推进研究以了解根本原因并实施有效的防控策略至关重要。作为回应,许多国家为支持研究计划提供了每日病例数据的获取渠道,促进了组织间的合作,并通过开放数据平台向公众提供此类数据。尽管数据共享有诸多优点,但在发布健康数据之前,一个主要担忧是其对个人隐私的影响。这样的数据共享过程应遵循设计和默认时的最新数据保护方法。在本文中,我们使用来自葡萄牙第二大医院的新冠数据集来展示如何在提高数据质量和保持数据实用性的同时确保数据隐私。我们的目标是证明医疗从业者、数据隐私和数据科学专家的多学科团队中的知识交流对于共同制定确保去标识化数据高实用性的策略至关重要。