Suppr超能文献

评估匿名数据集重新识别风险的实用且现成的方法。

Practical and ready-to-use methodology to assess the re-identification risk in anonymized datasets.

作者信息

Sondeck Louis Philippe, Laurent Maryline

机构信息

COACHMESEC Consulting (Clever Identity), 151 rue des Meuniers, Bagneux, 92220, France.

Samovar, Télécom SudParis, Institut Polytechnique de Paris, 19 Place Marguerite Perey, Palaiseau, 91120, France.

出版信息

Sci Rep. 2025 Jul 2;15(1):23223. doi: 10.1038/s41598-025-04907-3.

Abstract

To prove that a dataset is sufficiently anonymized, many privacy policies suggest that a re-identification risk assessment be performed, but do not provide a precise methodology for doing so, leaving the industry alone with the problem. This paper proposes a practical and ready-to-use methodology for re-identification risk assessment, the originality of which is manifold: (1) it is the first to follow well-known risk analysis methods (e.g. EBIOS) that have been used in the cybersecurity field for years, which consider not only the ability to perform an attack, but also the severity such an attack can have on an individual; (2) it is the first to qualify attributes and values of attributes with e.g. degree of exposure, as known real-world attacks mainly target certain types of attributes and not others; (3) it is the first to provide clear, comprehensible criteria and interpretable, explainable assessment results. In addition, the fine granularity of the methodology makes it possible to score the risk as accurately as possible, and thus maintain good data quality at an acceptable risk, which is very promising for the AI industrial sector. Finally, the implementation of the methodology is illustrated using the publicly available Adult dataset, which was assessed as having a critical risk of re-identification, with 14 concrete cases of individualization.

摘要

为证明一个数据集已充分匿名化,许多隐私政策建议进行重新识别风险评估,但未提供具体的操作方法,这使得该行业只能独自面对这个问题。本文提出了一种实用且易于使用的重新识别风险评估方法,其创新性体现在多个方面:(1)它首次采用了多年来在网络安全领域使用的知名风险分析方法(如EBIOS),该方法不仅考虑攻击的能力,还考虑此类攻击对个人可能造成的严重性;(2)它首次对属性及其值进行了定性,例如暴露程度,因为已知现实世界中的攻击主要针对某些类型的属性而非其他属性;(3)它首次提供了清晰、易懂的标准以及可解释、可说明的评估结果。此外,该方法的精细粒度使得能够尽可能准确地对风险进行评分,从而在可接受的风险水平下保持良好的数据质量,这对人工智能产业部门非常有前景。最后,使用公开可用的成人数据集说明了该方法的实施情况,该数据集被评估为具有重新识别的关键风险,存在14个具体的个体化案例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d3d/12222771/d4f58eb2a433/41598_2025_4907_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验