Rocher Luc, Hendrickx Julien M, Montjoye Yves-Alexandre de
Oxford Internet Institute, University of Oxford, Oxford, UK.
Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Université catholique de Louvain, Louvain-la-Neuve, Belgium.
Nat Commun. 2025 Jan 9;16(1):347. doi: 10.1038/s41467-024-55296-6.
AI techniques are increasingly being used to identify individuals both offline and online. However, quantifying their effectiveness at scale and, by extension, the risks they pose remains a significant challenge. Here, we propose a two-parameter Bayesian model for exact matching techniques and derive an analytical expression for correctness (κ), the fraction of people accurately identified in a population. We then generalize the model to forecast how κ scales from small-scale experiments to the real world, for exact, sparse, and machine learning-based robust identification techniques. Despite having only two degrees of freedom, our method closely fits 476 correctness curves and strongly outperforms curve-fitting methods and entropy-based rules of thumb. Our work provides a principled framework for forecasting the privacy risks posed by identification techniques, while also supporting independent accountability efforts for AI-based biometric systems.
人工智能技术越来越多地被用于离线和在线识别个体。然而,量化其大规模应用时的有效性以及由此带来的风险仍然是一项重大挑战。在此,我们针对精确匹配技术提出了一种双参数贝叶斯模型,并推导出正确性(κ)的解析表达式,即总体中被准确识别的人群比例。然后,我们将该模型进行推广,以预测κ如何从小规模实验扩展到现实世界,适用于精确、稀疏和基于机器学习的鲁棒识别技术。尽管只有两个自由度,但我们的方法紧密拟合了476条正确性曲线,并且明显优于曲线拟合方法和基于熵的经验法则。我们的工作为预测识别技术带来的隐私风险提供了一个有原则的框架,同时也支持基于人工智能的生物识别系统的独立问责工作。