Suppr超能文献

通过对抗建模实现现实健康数据重新识别风险评估。

Enabling realistic health data re-identification risk assessment through adversarial modeling.

机构信息

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

出版信息

J Am Med Inform Assoc. 2021 Mar 18;28(4):744-752. doi: 10.1093/jamia/ocaa327.

Abstract

OBJECTIVE

Re-identification risk methods for biomedical data often assume a worst case, in which attackers know all identifiable features (eg, age and race) about a subject. Yet, worst-case adversarial modeling can overestimate risk and induce heavy editing of shared data. The objective of this study is to introduce a framework for assessing the risk considering the attacker's resources and capabilities.

MATERIALS AND METHODS

We integrate 3 established risk measures (ie, prosecutor, journalist, and marketer risks) and compute re-identification probabilities for data subjects. This probability is dependent on an attacker's capabilities (eg, ability to obtain external identified resources) and the subject's decision on whether to reveal their participation in a dataset. We illustrate the framework through case studies using data from over 1 000 000 patients from Vanderbilt University Medical Center and show how re-identification risk changes when attackers are pragmatic and use 2 known resources for attack: (1) voter registration lists and (2) social media posts.

RESULTS

Our framework illustrates that the risk is substantially smaller in the pragmatic scenarios than in the worst case. Our experiments yield a median worst-case risk of 0.987 (where 0 is least risky and 1 is most risky); however, the median reduction in risk was 90.1% in the voter registration scenario and 100% in the social media posts scenario. Notably, these observations hold true for a wide range of adversarial capabilities.

CONCLUSIONS

This research illustrates that re-identification risk is situationally dependent and that appropriate adversarial modeling may permit biomedical data sharing on a wider scale than is currently the case.

摘要

目的

生物医学数据的再识别风险方法通常假设攻击者了解有关主体的所有可识别特征(例如年龄和种族)的最坏情况。然而,最坏情况对抗建模可能会高估风险并导致共享数据的大量编辑。本研究的目的是引入一种考虑攻击者资源和能力的风险评估框架。

材料与方法

我们整合了 3 种已建立的风险度量(即检察官、记者和营销人员风险),并计算了数据主体的再识别概率。该概率取决于攻击者的能力(例如,获取外部识别资源的能力)以及主体是否决定透露他们参与数据集。我们通过使用范德比尔特大学医学中心超过 100 万患者的数据进行案例研究来说明该框架,并展示了当攻击者务实并使用 2 种已知资源进行攻击时(1)选民登记名单和(2)社交媒体帖子时,再识别风险如何变化。

结果

我们的框架表明,在实际情况下,风险明显小于最坏情况。我们的实验产生了中位数最坏情况风险为 0.987(其中 0 是风险最小,1 是风险最大);然而,在选民登记情况下,风险中位数降低了 90.1%,在社交媒体帖子情况下则降低了 100%。值得注意的是,这些观察结果适用于广泛的对抗能力。

结论

这项研究表明,再识别风险是情境相关的,适当的对抗建模可能允许比目前更广泛地共享生物医学数据。

相似文献

7
Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN).基于生成对抗网络的数据合成匿名化(ADS-GAN)。
IEEE J Biomed Health Inform. 2020 Aug;24(8):2378-2388. doi: 10.1109/JBHI.2020.2980262. Epub 2020 Mar 12.

引用本文的文献

5
Privacy-Enhancing Technologies in Biomedical Data Science.生物医学数据科学中的隐私增强技术。
Annu Rev Biomed Data Sci. 2024 Aug;7(1):317-343. doi: 10.1146/annurev-biodatasci-120423-120107.

本文引用的文献

3
The "All of Us" Research Program.“All of Us”研究计划。
N Engl J Med. 2019 Aug 15;381(7):668-676. doi: 10.1056/NEJMsr1809937.
6
Privacy in the age of medical big data.医疗大数据时代的隐私问题。
Nat Med. 2019 Jan;25(1):37-43. doi: 10.1038/s41591-018-0272-7. Epub 2019 Jan 7.
8
Sharing data under the 21st Century Cures Act.根据《21 世纪治愈法案》共享数据。
Genet Med. 2017 Dec;19(12):1289-1294. doi: 10.1038/gim.2017.59. Epub 2017 May 25.
9
How consumer physical activity monitors could transform human physiology research.消费者身体活动监测设备如何改变人体生理学研究。
Am J Physiol Regul Integr Comp Physiol. 2017 Mar 1;312(3):R358-R367. doi: 10.1152/ajpregu.00349.2016. Epub 2017 Jan 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验