Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
J Am Med Inform Assoc. 2023 Apr 19;30(5):907-914. doi: 10.1093/jamia/ocad021.
The All of Us Research Program makes individual-level data available to researchers while protecting the participants' privacy. This article describes the protections embedded in the multistep access process, with a particular focus on how the data was transformed to meet generally accepted re-identification risk levels.
At the time of the study, the resource consisted of 329 084 participants. Systematic amendments were applied to the data to mitigate re-identification risk (eg, generalization of geographic regions, suppression of public events, and randomization of dates). We computed the re-identification risk for each participant using a state-of-the-art adversarial model specifically assuming that it is known that someone is a participant in the program. We confirmed the expected risk is no greater than 0.09, a threshold that is consistent with guidelines from various US state and federal agencies. We further investigated how risk varied as a function of participant demographics.
The results indicated that 95th percentile of the re-identification risk of all the participants is below current thresholds. At the same time, we observed that risk levels were higher for certain race, ethnic, and genders.
While the re-identification risk was sufficiently low, this does not imply that the system is devoid of risk. Rather, All of Us uses a multipronged data protection strategy that includes strong authentication practices, active monitoring of data misuse, and penalization mechanisms for users who violate terms of service.
All of Us 研究计划使研究人员能够获得个体层面的数据,同时保护参与者的隐私。本文描述了嵌入在多步骤访问过程中的保护措施,特别关注如何转换数据以达到公认的重新识别风险水平。
在研究时,该资源包含 329084 名参与者。对数据进行了系统的修正,以降低重新识别风险(例如,地理区域的泛化、公共事件的抑制和日期的随机化)。我们使用一种专门假设已知某人是该计划参与者的最先进对抗模型,为每个参与者计算重新识别风险。我们确认预期风险不超过 0.09,这一阈值与来自美国各个州和联邦机构的指南一致。我们进一步研究了风险如何随参与者人口统计学特征的变化而变化。
结果表明,所有参与者重新识别风险的第 95 百分位数均低于当前阈值。同时,我们观察到某些种族、民族和性别群体的风险水平较高。
虽然重新识别风险足够低,但这并不意味着该系统没有风险。相反,All of Us 使用了一种多管齐下的数据保护策略,包括强大的身份验证实践、对数据滥用的主动监控以及对违反服务条款的用户的惩罚机制。