Gadotti Andrea, Rocher Luc, Houssiau Florimond, Creţu Ana-Maria, de Montjoye Yves-Alexandre
Imperial College London, Exhibition Road, London SW7 2AZ, UK.
University of Oxford, Wellington Square, Oxford OX1 2JD, UK.
Sci Adv. 2024 Jul 19;10(29):eadn7053. doi: 10.1126/sciadv.adn7053. Epub 2024 Jul 17.
Information about us, our actions, and our preferences is created at scale through surveys or scientific studies or as a result of our interaction with digital devices such as smartphones and fitness trackers. The ability to safely share and analyze such data is key for scientific and societal progress. Anonymization is considered by scientists and policy-makers as one of the main ways to share data while minimizing privacy risks. In this review, we offer a pragmatic perspective on the modern literature on privacy attacks and anonymization techniques. We discuss traditional de-identification techniques and their strong limitations in the age of big data. We then turn our attention to modern approaches to share anonymous aggregate data, such as data query systems, synthetic data, and differential privacy. We find that, although no perfect solution exists, applying modern techniques while auditing their guarantees against attacks is the best approach to safely use and share data today.
关于我们自身、我们的行为以及我们的偏好等信息,是通过调查、科学研究,或者由于我们与智能手机和健身追踪器等数字设备的交互而大规模生成的。安全共享和分析此类数据的能力是科学和社会进步的关键。科学家和政策制定者认为匿名化是在最小化隐私风险的同时共享数据的主要方式之一。在本综述中,我们对有关隐私攻击和匿名化技术的现代文献提供了一个务实的观点。我们讨论了传统的去识别技术及其在大数据时代的严重局限性。然后,我们将注意力转向共享匿名汇总数据的现代方法,如数据查询系统、合成数据和差分隐私。我们发现,尽管不存在完美的解决方案,但在审核其抗攻击保障措施的同时应用现代技术是当今安全使用和共享数据的最佳方法。