Media Lab, Massachusetts Institute of Technology (MIT), 20 Amherst Street, Cambridge, MA 02139, USA.
Department of Computer Science, Aarhus University, Aabogade 34, Aarhus, 8200, Denmark.
Science. 2015 Jan 30;347(6221):536-9. doi: 10.1126/science.1256297.
Large-scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities, or perform research. Metadata, however, contain sensitive information. Understanding the privacy of these data sets is key to their broad use and, ultimately, their impact. We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90% of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22%, on average. Finally, we show that even data sets that provide coarse information at any or all of the dimensions provide little anonymity and that women are more reidentifiable than men in credit card metadata.
大规模的人类行为数据集有可能从根本上改变我们治疗疾病、设计城市或进行研究的方式。然而,元数据包含敏感信息。理解这些数据集的隐私是广泛使用它们的关键,最终也是影响它们的关键。我们研究了 110 万人的信用卡记录 3 个月的数据,结果表明,四个时空点足以唯一重新识别 90%的个体。我们发现,平均而言,了解交易价格会使重新识别的风险增加 22%。最后,我们发现,即使在任何或所有维度上提供粗略信息的数据集中,也几乎没有匿名性,并且在信用卡元数据中,女性比男性更容易被重新识别。