一种用于模拟识别技术有效性的标度律。

A scaling law to model the effectiveness of identification techniques.

作者信息

Rocher Luc, Hendrickx Julien M, Montjoye Yves-Alexandre de

机构信息

Oxford Internet Institute, University of Oxford, Oxford, UK.

Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Université catholique de Louvain, Louvain-la-Neuve, Belgium.

出版信息

Nat Commun. 2025 Jan 9;16(1):347. doi: 10.1038/s41467-024-55296-6.

DOI:10.1038/s41467-024-55296-6

PMID:39788959

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11718298/

Abstract

AI techniques are increasingly being used to identify individuals both offline and online. However, quantifying their effectiveness at scale and, by extension, the risks they pose remains a significant challenge. Here, we propose a two-parameter Bayesian model for exact matching techniques and derive an analytical expression for correctness (κ), the fraction of people accurately identified in a population. We then generalize the model to forecast how κ scales from small-scale experiments to the real world, for exact, sparse, and machine learning-based robust identification techniques. Despite having only two degrees of freedom, our method closely fits 476 correctness curves and strongly outperforms curve-fitting methods and entropy-based rules of thumb. Our work provides a principled framework for forecasting the privacy risks posed by identification techniques, while also supporting independent accountability efforts for AI-based biometric systems.

摘要

人工智能技术越来越多地被用于离线和在线识别个体。然而，量化其大规模应用时的有效性以及由此带来的风险仍然是一项重大挑战。在此，我们针对精确匹配技术提出了一种双参数贝叶斯模型，并推导出正确性（κ）的解析表达式，即总体中被准确识别的人群比例。然后，我们将该模型进行推广，以预测κ如何从小规模实验扩展到现实世界，适用于精确、稀疏和基于机器学习的鲁棒识别技术。尽管只有两个自由度，但我们的方法紧密拟合了476条正确性曲线，并且明显优于曲线拟合方法和基于熵的经验法则。我们的工作为预测识别技术带来的隐私风险提供了一个有原则的框架，同时也支持基于人工智能的生物识别系统的独立问责工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/5eb997146861/41467_2024_55296_Fig1_HTML.jpg

相似文献

A scaling law to model the effectiveness of identification techniques.

Nat Commun. 2025 Jan 9;16(1):347. doi: 10.1038/s41467-024-55296-6.

Data Obfuscation Through Latent Space Projection for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection.

JMIRx Med. 2025 Mar 12;6:e70100. doi: 10.2196/70100.

Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.

Hum Genomics. 2025 Feb 23;19(1):16. doi: 10.1186/s40246-025-00716-x.

Combining physical-based model and machine learning to forecast chlorophyll-a concentration in freshwater lakes.

Sci Total Environ. 2024 Jan 10;907:168097. doi: 10.1016/j.scitotenv.2023.168097. Epub 2023 Oct 23.

Ensemble learning approach for advanced metering infrastructure in future smart grids.

PLoS One. 2023 Oct 18;18(10):e0289672. doi: 10.1371/journal.pone.0289672. eCollection 2023.

Translating theory into practice: assessing the privacy implications of concept-based explanations for biomedical AI.

Front Bioinform. 2023 Jul 5;3:1194993. doi: 10.3389/fbinf.2023.1194993. eCollection 2023.

Forecasting life expectancy, years of life lost, and all-cause and cause-specific mortality for 250 causes of death: reference and alternative scenarios for 2016-40 for 195 countries and territories.

Lancet. 2018 Nov 10;392(10159):2052-2090. doi: 10.1016/S0140-6736(18)31694-5. Epub 2018 Oct 16.

Inference-Based Similarity Search in Randomized Montgomery Domains for Privacy-Preserving Biometric Identification.

IEEE Trans Pattern Anal Mach Intell. 2018 Jul;40(7):1611-1624. doi: 10.1109/TPAMI.2017.2727048. Epub 2017 Jul 14.

Artificial intelligence for breast cancer detection and its health technology assessment: A scoping review.

Comput Biol Med. 2025 Jan;184:109391. doi: 10.1016/j.compbiomed.2024.109391. Epub 2024 Nov 22.

Leveraging code-free deep learning for pill recognition in clinical settings: A multicenter, real-world study of performance across multiple platforms.

Artif Intell Med. 2024 Apr;150:102844. doi: 10.1016/j.artmed.2024.102844. Epub 2024 Mar 13.

本文引用的文献

Anonymization: The imperfect science of using data while preserving privacy.

Sci Adv. 2024 Jul 19;10(29):eadn7053. doi: 10.1126/sciadv.adn7053. Epub 2024 Jul 17.

Biometric recognition of newborns and young children for vaccinations and health care: a non-randomized prospective clinical trial.

Sci Rep. 2022 Dec 29;12(1):22520. doi: 10.1038/s41598-022-25986-6.

Expanding the attack surface: Robust profiling attacks threaten the privacy of sparse behavioral data.

Sci Adv. 2022 Aug 19;8(33):eabl6464. doi: 10.1126/sciadv.abl6464.

Interaction data are identifiable even across long periods of time.

Nat Commun. 2022 Jan 25;13(1):313. doi: 10.1038/s41467-021-27714-6.

The risk of re-identification remains high even in country-scale location datasets.

Patterns (N Y). 2021 Mar 12;2(3):100204. doi: 10.1016/j.patter.2021.100204.

Temporal and cultural limits of privacy in smartphone app usage.

Sci Rep. 2021 Feb 16;11(1):3861. doi: 10.1038/s41598-021-82294-1.

Enabling realistic health data re-identification risk assessment through adversarial modeling.

J Am Med Inform Assoc. 2021 Mar 18;28(4):744-752. doi: 10.1093/jamia/ocaa327.

Digital technologies in the public-health response to COVID-19.

Nat Med. 2020 Aug;26(8):1183-1192. doi: 10.1038/s41591-020-1011-4. Epub 2020 Aug 7.

Estimating the success of re-identifications in incomplete datasets using generative models.

Nat Commun. 2019 Jul 23;10(1):3069. doi: 10.1038/s41467-019-10933-3.

Dense power-law networks and simplicial complexes.

Phys Rev E. 2018 May;97(5-1):052303. doi: 10.1103/PhysRevE.97.052303.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于模拟识别技术有效性的标度律。

A scaling law to model the effectiveness of identification techniques.

作者信息

Rocher Luc, Hendrickx Julien M, Montjoye Yves-Alexandre de

机构信息

Oxford Internet Institute, University of Oxford, Oxford, UK.

Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Université catholique de Louvain, Louvain-la-Neuve, Belgium.

出版信息

Nat Commun. 2025 Jan 9;16(1):347. doi: 10.1038/s41467-024-55296-6.

DOI:10.1038/s41467-024-55296-6

PMID:39788959

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11718298/

Abstract

摘要

一种用于模拟识别技术有效性的标度律。

A scaling law to model the effectiveness of identification techniques.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种用于模拟识别技术有效性的标度律。

A scaling law to model the effectiveness of identification techniques.

作者信息

机构信息

出版信息

相似文献

本文引用的文献