• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于模拟识别技术有效性的标度律。

A scaling law to model the effectiveness of identification techniques.

作者信息

Rocher Luc, Hendrickx Julien M, Montjoye Yves-Alexandre de

机构信息

Oxford Internet Institute, University of Oxford, Oxford, UK.

Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Université catholique de Louvain, Louvain-la-Neuve, Belgium.

出版信息

Nat Commun. 2025 Jan 9;16(1):347. doi: 10.1038/s41467-024-55296-6.

DOI:10.1038/s41467-024-55296-6
PMID:39788959
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11718298/
Abstract

AI techniques are increasingly being used to identify individuals both offline and online. However, quantifying their effectiveness at scale and, by extension, the risks they pose remains a significant challenge. Here, we propose a two-parameter Bayesian model for exact matching techniques and derive an analytical expression for correctness (κ), the fraction of people accurately identified in a population. We then generalize the model to forecast how κ scales from small-scale experiments to the real world, for exact, sparse, and machine learning-based robust identification techniques. Despite having only two degrees of freedom, our method closely fits 476 correctness curves and strongly outperforms curve-fitting methods and entropy-based rules of thumb. Our work provides a principled framework for forecasting the privacy risks posed by identification techniques, while also supporting independent accountability efforts for AI-based biometric systems.

摘要

人工智能技术越来越多地被用于离线和在线识别个体。然而,量化其大规模应用时的有效性以及由此带来的风险仍然是一项重大挑战。在此,我们针对精确匹配技术提出了一种双参数贝叶斯模型,并推导出正确性(κ)的解析表达式,即总体中被准确识别的人群比例。然后,我们将该模型进行推广,以预测κ如何从小规模实验扩展到现实世界,适用于精确、稀疏和基于机器学习的鲁棒识别技术。尽管只有两个自由度,但我们的方法紧密拟合了476条正确性曲线,并且明显优于曲线拟合方法和基于熵的经验法则。我们的工作为预测识别技术带来的隐私风险提供了一个有原则的框架,同时也支持基于人工智能的生物识别系统的独立问责工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/1412f2d5a615/41467_2024_55296_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/5eb997146861/41467_2024_55296_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/5b7c1a0ba199/41467_2024_55296_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/acce1fe0cadf/41467_2024_55296_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/f5077c68306b/41467_2024_55296_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/1412f2d5a615/41467_2024_55296_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/5eb997146861/41467_2024_55296_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/5b7c1a0ba199/41467_2024_55296_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/acce1fe0cadf/41467_2024_55296_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/f5077c68306b/41467_2024_55296_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a1/11718298/1412f2d5a615/41467_2024_55296_Fig5_HTML.jpg

相似文献

1
A scaling law to model the effectiveness of identification techniques.一种用于模拟识别技术有效性的标度律。
Nat Commun. 2025 Jan 9;16(1):347. doi: 10.1038/s41467-024-55296-6.
2
Data Obfuscation Through Latent Space Projection for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection.通过潜在空间投影进行数据混淆以实现隐私保护的人工智能治理:医学诊断和金融欺诈检测案例研究
JMIRx Med. 2025 Mar 12;6:e70100. doi: 10.2196/70100.
3
Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.基于人工智能的基因组学和用于高通量筛选研究的自动显微镜图像分析中的数据管理与整理实践:推动可靠且符合伦理的人工智能应用。
Hum Genomics. 2025 Feb 23;19(1):16. doi: 10.1186/s40246-025-00716-x.
4
Combining physical-based model and machine learning to forecast chlorophyll-a concentration in freshwater lakes.结合基于物理的模型和机器学习来预测淡水湖泊中的叶绿素-a 浓度。
Sci Total Environ. 2024 Jan 10;907:168097. doi: 10.1016/j.scitotenv.2023.168097. Epub 2023 Oct 23.
5
Ensemble learning approach for advanced metering infrastructure in future smart grids.用于未来智能电网中高级计量基础设施的集成学习方法。
PLoS One. 2023 Oct 18;18(10):e0289672. doi: 10.1371/journal.pone.0289672. eCollection 2023.
6
Translating theory into practice: assessing the privacy implications of concept-based explanations for biomedical AI.将理论转化为实践:评估基于概念的生物医学人工智能解释对隐私的影响。
Front Bioinform. 2023 Jul 5;3:1194993. doi: 10.3389/fbinf.2023.1194993. eCollection 2023.
7
Forecasting life expectancy, years of life lost, and all-cause and cause-specific mortality for 250 causes of death: reference and alternative scenarios for 2016-40 for 195 countries and territories.预测 250 种死因的预期寿命、损失的生命年数以及全因和特定死因死亡率:2016-2040 年 195 个国家和地区的参考和替代情景。
Lancet. 2018 Nov 10;392(10159):2052-2090. doi: 10.1016/S0140-6736(18)31694-5. Epub 2018 Oct 16.
8
Inference-Based Similarity Search in Randomized Montgomery Domains for Privacy-Preserving Biometric Identification.基于推理的随机蒙哥马利域相似性搜索用于隐私保护生物识别。
IEEE Trans Pattern Anal Mach Intell. 2018 Jul;40(7):1611-1624. doi: 10.1109/TPAMI.2017.2727048. Epub 2017 Jul 14.
9
Artificial intelligence for breast cancer detection and its health technology assessment: A scoping review.用于乳腺癌检测的人工智能及其健康技术评估:一项范围综述。
Comput Biol Med. 2025 Jan;184:109391. doi: 10.1016/j.compbiomed.2024.109391. Epub 2024 Nov 22.
10
Leveraging code-free deep learning for pill recognition in clinical settings: A multicenter, real-world study of performance across multiple platforms.利用无代码深度学习在临床环境中进行药丸识别:在多个平台上进行的多中心真实世界性能研究。
Artif Intell Med. 2024 Apr;150:102844. doi: 10.1016/j.artmed.2024.102844. Epub 2024 Mar 13.

本文引用的文献

1
Anonymization: The imperfect science of using data while preserving privacy.匿名化:在保护隐私的同时使用数据的不完美科学。
Sci Adv. 2024 Jul 19;10(29):eadn7053. doi: 10.1126/sciadv.adn7053. Epub 2024 Jul 17.
2
Biometric recognition of newborns and young children for vaccinations and health care: a non-randomized prospective clinical trial.新生儿和幼儿的生物识别技术在疫苗接种和医疗保健中的应用:一项非随机前瞻性临床试验。
Sci Rep. 2022 Dec 29;12(1):22520. doi: 10.1038/s41598-022-25986-6.
3
Expanding the attack surface: Robust profiling attacks threaten the privacy of sparse behavioral data.
扩大攻击面:强大的剖析攻击威胁稀疏行为数据的隐私。
Sci Adv. 2022 Aug 19;8(33):eabl6464. doi: 10.1126/sciadv.abl6464.
4
Interaction data are identifiable even across long periods of time.交互数据即使在很长一段时间内也是可识别的。
Nat Commun. 2022 Jan 25;13(1):313. doi: 10.1038/s41467-021-27714-6.
5
The risk of re-identification remains high even in country-scale location datasets.即使在国家规模的位置数据集中,重新识别的风险仍然很高。
Patterns (N Y). 2021 Mar 12;2(3):100204. doi: 10.1016/j.patter.2021.100204.
6
Temporal and cultural limits of privacy in smartphone app usage.智能手机应用使用中隐私的时间和文化限制。
Sci Rep. 2021 Feb 16;11(1):3861. doi: 10.1038/s41598-021-82294-1.
7
Enabling realistic health data re-identification risk assessment through adversarial modeling.通过对抗建模实现现实健康数据重新识别风险评估。
J Am Med Inform Assoc. 2021 Mar 18;28(4):744-752. doi: 10.1093/jamia/ocaa327.
8
Digital technologies in the public-health response to COVID-19.数字技术在应对 COVID-19 中的公共卫生响应。
Nat Med. 2020 Aug;26(8):1183-1192. doi: 10.1038/s41591-020-1011-4. Epub 2020 Aug 7.
9
Estimating the success of re-identifications in incomplete datasets using generative models.利用生成模型估计不完全数据集重识别的成功率。
Nat Commun. 2019 Jul 23;10(1):3069. doi: 10.1038/s41467-019-10933-3.
10
Dense power-law networks and simplicial complexes.密集幂律网络与单纯复形。
Phys Rev E. 2018 May;97(5-1):052303. doi: 10.1103/PhysRevE.97.052303.