扩大攻击面：强大的剖析攻击威胁稀疏行为数据的隐私。

Expanding the attack surface: Robust profiling attacks threaten the privacy of sparse behavioral data.

作者信息

Tournier Arnaud J, de Montjoye Yves-Alexandre

机构信息

Department of Computing, Imperial College London, London, UK.

Data Science Institute, Imperial College London, London, UK.

出版信息

Sci Adv. 2022 Aug 19;8(33):eabl6464. doi: 10.1126/sciadv.abl6464.

DOI:10.1126/sciadv.abl6464

PMID:35984877

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11323786/

Abstract

Behavioral data, collected from our daily interactions with technology, have driven scientific advances. Yet, the collection and sharing of this data raise legitimate privacy concerns, as individuals can often be reidentified. Current identification attacks, however, require auxiliary information to roughly match the information available in the dataset, limiting their applicability. We here propose an entropy-based profiling model to learn time-persistent profiles. Using auxiliary information about a single target collected over a nonoverlapping time period, we show that individuals are correctly identified 79% of the time in a large location dataset of 0.5 million individuals and 65.2% for a grocery shopping dataset of 85,000 individuals. We further show that accuracy only slowly decreases over time and that the model is robust to state-of-the-art noise addition. Our results show that much more auxiliary information than previously believed can be used to identify individuals, challenging deidentification practices and what currently constitutes legally anonymous data.

摘要

从我们与技术的日常交互中收集的行为数据推动了科学进步。然而，这些数据的收集和共享引发了合理的隐私担忧，因为个人往往能够被重新识别。然而，当前的识别攻击需要辅助信息来大致匹配数据集中可用的信息，这限制了它们的适用性。我们在此提出一种基于熵的剖析模型来学习随时间持续的剖析。利用在不重叠时间段内收集的关于单个目标的辅助信息，我们表明，在一个包含50万个体的大型位置数据集中，79%的个体能被正确识别，在一个包含85000个体的杂货店购物数据集中，这一比例为65.2%。我们进一步表明，随着时间推移，准确率只是缓慢下降，并且该模型对最先进的噪声添加具有鲁棒性。我们的结果表明，可用于识别个体的辅助信息比之前认为的要多得多，这对去识别化实践以及目前构成合法匿名数据的内容提出了挑战。