对健康数据再识别攻击的系统综述。

A systematic review of re-identification attacks on health data.

机构信息

Electronic Health Information Laboratory, CHEO Research Institute, Ottawa, Canada.

出版信息

PLoS One. 2011;6(12):e28071. doi: 10.1371/journal.pone.0028071. Epub 2011 Dec 2.

DOI:10.1371/journal.pone.0028071

PMID:22164229

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3229505/

Abstract

BACKGROUND

Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods.

METHODS AND FINDINGS

Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046-0.478) and 0.34 for attacks on health data (95% CI 0-0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013.

CONCLUSIONS

The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods.

摘要

背景

大多数司法管辖区的隐私法规允许在未经患者同意的情况下，将健康数据用于二次目的进行披露，如果这些数据已经被去识别化。最近在医学、法律和计算机科学文献中有一些文章认为，去识别化方法并不能提供足够的保护，因为它们很容易被逆转。如果情况确实如此，这将对健康信息的披露方式产生重大而重要的影响，包括：(a)可能限制其用于研究等二次目的的可用性，以及 (b)导致更多可识别的健康信息被披露。我们在这项系统评价中的目标是：(a)描述已知的针对健康数据的重新识别攻击，并将其与针对其他类型数据的重新识别攻击进行对比，(b)计算这些攻击中正确重新识别的记录的总体比例，以及 (c)评估这些攻击是否表明当前去识别方法存在弱点。

方法和发现

在 IEEE Xplore、ACM Digital Library 和 PubMed 中进行了搜索。经过筛选，确定了 14 篇具有不同攻击方式的合格文章。平均而言，所有研究中约有四分之一的记录被重新识别（0.26，95%置信区间为 0.046-0.478），而针对健康数据的攻击为 0.34（95%置信区间为 0-0.744）。由于置信区间较宽，证据表明，这些比例存在很大的不确定性，并且重新识别的记录平均比例对未发表的研究很敏感。在 14 次攻击中有两次是针对使用现有标准进行去识别化的数据进行的。这两次攻击中只有一次是针对健康数据，成功率为 0.00013。

结论

目前的证据表明重新识别率较高，但主要是针对未按照现有标准进行去识别化的数据的小规模研究。这些证据不足以得出关于去识别方法效果的结论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d07/3229505/75c6b2a2054a/pone.0028071.g001.jpg

相似文献

A systematic review of re-identification attacks on health data.

PLoS One. 2011;6(12):e28071. doi: 10.1371/journal.pone.0028071. Epub 2011 Dec 2.

Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data.

J Am Med Inform Assoc. 2013 Jan 1;20(1):29-34. doi: 10.1136/amiajnl-2012-000936. Epub 2012 Jun 26.

Protecting patient privacy in clinical data mining.

J Healthc Inf Manag. 2002 Fall;16(4):62-7.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Complying with the Health Insurance Portability and Accountability Act. Privacy standards.

AAOHN J. 2001 Nov;49(11):501-7.

Evaluating common de-identification heuristics for personal health information.

J Med Internet Res. 2006 Nov 21;8(4):e28. doi: 10.2196/jmir.8.4.e28.

Standards for privacy of individually identifiable health information. HIPAA implementation.

Kans Nurse. 2002 Jan;77(1):10-1.

The Health Insurance Portability and Accountability Act Privacy Rule: a practical guide for researchers.

Med Care. 2004 Apr;42(4):321-7. doi: 10.1097/01.mlr.0000119578.94846.f2.

The Health Insurance Portability and Accountability Act: practice of dentistry in the United States: privacy and confidentiality.

J Contemp Dent Pract. 2003 Feb 15;4(1):59-70.

Are personal health records safe? A review of free web-accessible personal health record privacy policies.

J Med Internet Res. 2012 Aug 23;14(4):e114. doi: 10.2196/jmir.1904.

引用本文的文献

A dual path graph neural network framework for dementia diagnosis.

Sci Rep. 2025 Jul 2;15(1):23319. doi: 10.1038/s41598-025-06519-3.

Practical and ready-to-use methodology to assess the re-identification risk in anonymized datasets.

Sci Rep. 2025 Jul 2;15(1):23223. doi: 10.1038/s41598-025-04907-3.

Exploring the Potential of ChatGPT for the Summarization of Patient Medical Histories: A Pilot Study.

Cureus. 2025 May 14;17(5):e84133. doi: 10.7759/cureus.84133. eCollection 2025 May.

Revolutionizing Utility of Big Data Analytics in Personalized Cardiovascular Healthcare.

Bioengineering (Basel). 2025 Apr 27;12(5):463. doi: 10.3390/bioengineering12050463.

A quantitative analysis of the use of anonymization in biomedical research.

NPJ Digit Med. 2025 May 14;8(1):279. doi: 10.1038/s41746-025-01644-9.

Patient agency and large language models in worldwide encoding of equity.

NPJ Digit Med. 2025 May 8;8(1):258. doi: 10.1038/s41746-025-01598-y.

Chat Generative Pre-Trained Transformer (ChatGPT) in Oral and Maxillofacial Surgery: A Narrative Review on Its Research Applications and Limitations.

J Clin Med. 2025 Feb 18;14(4):1363. doi: 10.3390/jcm14041363.

PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning.

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:2873-2885. doi: 10.18653/v1/2022.emnlp-main.185.

Economics and Equity of Large Language Models: Health Care Perspective.

J Med Internet Res. 2024 Nov 14;26:e64226. doi: 10.2196/64226.

Applications of ChatGPT in Heart Failure Prevention, Diagnosis, Management, and Research: A Narrative Review.

Diagnostics (Basel). 2024 Oct 27;14(21):2393. doi: 10.3390/diagnostics14212393.

本文引用的文献

Identifiability in biobanks: models, measures, and mitigation strategies.

Hum Genet. 2011 Sep;130(3):383-92. doi: 10.1007/s00439-011-1042-5. Epub 2011 Jul 8.

Methods for the de-identification of electronic health records for genomic research.

Genome Med. 2011 Apr 27;3(4):25. doi: 10.1186/gm239.

Prescriptions, privacy, and the First Amendment.

N Engl J Med. 2011 May 26;364(21):2053-5. doi: 10.1056/NEJMe1104460. Epub 2011 Apr 27.

Is deidentification sufficient to protect health privacy in research?

Am J Bioeth. 2010 Sep;10(9):3-11. doi: 10.1080/15265161.2010.494215.

Evaluating re-identification risks with respect to the HIPAA privacy rule.

J Am Med Inform Assoc. 2010 Mar-Apr;17(2):169-77. doi: 10.1136/jamia.2009.000026.

Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers.

BMJ. 2010 Jan 28;340:c181. doi: 10.1136/bmj.c181.

A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies.

Nat Genet. 2009 Nov;41(11):1253-7. doi: 10.1038/ng.455. Epub 2009 Oct 4.

The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration.

J Clin Epidemiol. 2009 Oct;62(10):e1-34. doi: 10.1016/j.jclinepi.2009.06.006. Epub 2009 Jul 23.

Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement.

PLoS Med. 2009 Jul 21;6(7):e1000097. doi: 10.1371/journal.pmed.1000097.

A globally optimal k-anonymity method for the de-identification of health data.

J Am Med Inform Assoc. 2009 Sep-Oct;16(5):670-82. doi: 10.1197/jamia.M3144. Epub 2009 Jun 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对健康数据再识别攻击的系统综述。

A systematic review of re-identification attacks on health data.

机构信息

Electronic Health Information Laboratory, CHEO Research Institute, Ottawa, Canada.