Wan Zhiyu, Vorobeychik Yevgeniy, Xia Weiyi, Liu Yongtai, Wooders Myrna, Guo Jia, Yin Zhijun, Clayton Ellen Wright, Kantarcioglu Murat, Malin Bradley A
Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA.
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA.
Sci Adv. 2021 Dec 10;7(50):eabe9986. doi: 10.1126/sciadv.abe9986.
Person-specific biomedical data are now widely collected, but its sharing raises privacy concerns, specifically about the re-identification of seemingly anonymous records. Formal re-identification risk assessment frameworks can inform decisions about whether and how to share data; current techniques, however, focus on scenarios where the data recipients use only one resource for re-identification purposes. This is a concern because recent attacks show that adversaries can access multiple resources, combining them in a stage-wise manner, to enhance the chance of an attack’s success. In this work, we represent a re-identification game using a two-player Stackelberg game of perfect information, which can be applied to assess risk, and suggest an optimal data sharing strategy based on a privacy-utility tradeoff. We report on experiments with large-scale genomic datasets to show that, using game theoretic models accounting for adversarial capabilities to launch multistage attacks, most data can be effectively shared with low re-identification risk.
针对个人的生物医学数据如今已被广泛收集,但其共享引发了隐私担忧,特别是关于看似匿名的记录被重新识别的问题。正式的重新识别风险评估框架可为有关是否以及如何共享数据的决策提供依据;然而,当前的技术主要关注数据接收者仅使用一种资源进行重新识别的情况。这是一个问题,因为最近的攻击表明,对手可以访问多种资源,并以分阶段的方式将它们组合起来,以提高攻击成功的几率。在这项工作中,我们使用具有完美信息的两人斯塔克尔伯格博弈来表示重新识别博弈,该博弈可用于评估风险,并基于隐私 - 效用权衡提出最优数据共享策略。我们报告了对大规模基因组数据集的实验结果,以表明使用考虑了对手发动多阶段攻击能力的博弈论模型,大多数数据可以在低重新识别风险的情况下有效地共享。