生物医学文献中匿名化和去识别化的使用与理解：范围综述

Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.

作者信息

Chevrier Raphaël, Foufi Vasiliki, Gaudet-Blavignac Christophe, Robert Arnaud, Lovis Christian

机构信息

Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.

Faculty of Medicine, University of Geneva, Geneva, Switzerland.

出版信息

J Med Internet Res. 2019 May 31;21(5):e13484. doi: 10.2196/13484.

DOI:10.2196/13484

PMID:31152528

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6658290/

Abstract

BACKGROUND

The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients' privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects' privacy on one side, and the benefit of scientific advances on the other.

OBJECTIVE

This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers.

METHODS

Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently.

RESULTS

After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data.

CONCLUSIONS

Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community.

摘要

背景

在数据科学和精准医学时代，健康数据的二次利用是生物医学研究的核心。国家和国际倡议，如全球开放、可查找、可访问、可互操作和可重用（GO FAIR）倡议，正在以不同方式支持这种方法（例如，强制要求共享研究数据或完善法律和伦理框架）。在这种情况下，保护患者隐私至关重要。去标识化和匿名化是用于指代保护隐私并促进健康数据二次利用的技术方法的两个最常用术语。然而，对于这些概念的定义或用于应用它们的技术的可靠性，很难达成共识。需要进行全面综述，以更好地理解该领域、其能力、挑战以及一方面数据主体隐私与另一方面科学进步益处之间的风险比例。

目的

这项工作旨在更好地理解研究界如何理解和定义去标识化和匿名化概念。丰富的概述还应提供有关方法的使用和可靠性的见解。将研究六个方面：（1）术语和定义；（2）研究人员的背景和工作地点；（3）对健康数据进行匿名化或去标识化的原因；（4）技术的局限性；（5）法律和伦理方面；（6）研究人员的建议。

方法

基于事先设计的范围综述方案，在MEDLINE中搜索2007年至2017年期间讨论去标识化或匿名化的出版物。搜索仅限于MEDLINE，以专注于生命科学领域。筛选过程由两名审稿人独立进行。

结果

在搜索了7972条至少匹配一个搜索词的记录后，筛选了135篇出版物，纳入了60篇全文文章。（1）术语：不到一半的文章（29/60，48%）提供了去标识化和匿名化术语的定义。当同时使用这两个术语时（41/60，68%），它们的含义将作者分成了两组（各19/60，32%），观点相反。其余文章（3/60，5%）含糊不清。（2）背景和地点：研究团队主要位于北美（31/60，52%）和欧盟（22/60，37%）。作者来自19个不同领域；计算机科学（91/248，36.7%）、生物医学信息学（47/248，19.0%）和医学（上38/248，15.3%）是最普遍的领域。（3）目的：声明应用这些技术的主要原因是促进生物医学研究。（4）局限性：特定技术有进展，但总体而言，局限性仍然很多。（5）法律和伦理方面：各国在定义、方法和法律实践上存在差异。（6）建议：组织、法律、伦理和技术方法相结合对于保护健康数据是必要的。

结论

生命科学领域对增强隐私技术的兴趣在增加。这种兴趣跨越科学界限，主要涉及计算机科学、生物医学信息学和医学。在去标识化和匿名化术语的使用中观察到的变异性强调了需要更清晰的定义以及更好地开展关于该主题的教育和信息传播。同样的观察结果也适用于方法。一些立法，如美国《健康保险流通与责任法案》（HIPAA）和欧洲《通用数据保护条例》（GDPR），对该领域进行规范。使用它们提供的定义有助于解决研究界对这两个概念的不同用法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc1a/6658290/ba6552571f58/jmir_v21i5e13484_fig1.jpg

相似文献

Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.生物医学文献中匿名化和去识别化的使用与理解：范围综述

J Med Internet Res. 2019 May 31;21(5):e13484. doi: 10.2196/13484.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Challenges in mapping European rare disease databases, relevant for ML-based screening technologies in terms of organizational, FAIR and legal principles: scoping review.绘制欧洲罕见病数据库图谱所面临的挑战，从组织、FAIR 和法律原则方面来看，这些挑战与基于机器学习的筛查技术相关：范围综述。

Front Public Health. 2023 Sep 15;11:1214766. doi: 10.3389/fpubh.2023.1214766. eCollection 2023.

Patient Privacy in the Era of Big Data.大数据时代的患者隐私

Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.

Ethics of Procuring and Using Organs or Tissue from Infants and Newborns for Transplantation, Research, or Commercial Purposes: Protocol for a Bioethics Scoping Review.从婴儿和新生儿获取器官或组织用于移植、研究或商业目的的伦理问题：生物伦理学范围审查方案

Wellcome Open Res. 2024 Dec 5;9:717. doi: 10.12688/wellcomeopenres.23235.1. eCollection 2024.

Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.超越黑木树：影响澳大利亚地区、农村和偏远地区的健康研究问题的快速综述。

Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.

The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本：使用临床数据的案例研究

J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.

De-identification of free text data containing personal health information: a scoping review of reviews.去标识化包含个人健康信息的自由文本数据：综述的综述。

Int J Popul Data Sci. 2023 Dec 12;8(1):2153. doi: 10.23889/ijpds.v8i1.2153. eCollection 2023.

Reconsidering Anonymization-Related Concepts and the Term "Identification" Against the Backdrop of the European Legal Framework.在欧洲法律框架背景下重新审视与匿名化相关的概念及“识别”一词

Biopreserv Biobank. 2016 Oct;14(5):367-374. doi: 10.1089/bio.2015.0100. Epub 2016 Apr 22.

What Does Anonymization Mean? DataSHIELD and the Need for Consensus on Anonymization Terminology.匿名化是什么意思？DataSHIELD与匿名化术语达成共识的必要性。

Biopreserv Biobank. 2016 Jun;14(3):224-30. doi: 10.1089/bio.2015.0119. Epub 2016 May 24.

引用本文的文献

Ethical implications of neurotechnology in industry-academia partnerships: Insights from patient and research participant interviews.神经技术在产学研合作中的伦理意义：来自患者和研究参与者访谈的见解

PLoS One. 2025 Sep 2;20(9):e0330367. doi: 10.1371/journal.pone.0330367. eCollection 2025.

Current Bioinformatics Tools in Precision Oncology.精准肿瘤学中的当前生物信息学工具

MedComm (2020). 2025 Jul 9;6(7):e70243. doi: 10.1002/mco2.70243. eCollection 2025 Jul.

[Applications, challenges and a trustworthy use of artificial intelligence in public health].[人工智能在公共卫生中的应用、挑战及可靠使用]

Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2025 Aug;68(8):880-888. doi: 10.1007/s00103-025-04098-2. Epub 2025 Jul 2.

Systematic analysis of gout burden among young adults in China from 1990 to 2021: findings from the global burden of disease study 2021.1990年至2021年中国青年成年人痛风负担的系统分析：全球疾病负担研究2021的结果

Front Public Health. 2025 Jun 9;13:1613801. doi: 10.3389/fpubh.2025.1613801. eCollection 2025.

Manual and automated facial de-identification techniques for patient imaging with preservation of sinonasal anatomy.用于患者成像且保留鼻窦解剖结构的手动和自动面部去识别技术。

Int J Comput Assist Radiol Surg. 2025 May 29. doi: 10.1007/s11548-025-03421-1.

Applicability Assessment of Technologies for Predictive and Prescriptive Analytics of Nephrology Big Data.肾脏病大数据预测性与规范性分析技术的适用性评估

Proteomics. 2025 Jun;25(11-12):e202400135. doi: 10.1002/pmic.202400135. Epub 2025 May 27.

Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review.使用大语言模型进行临床文本摘要的科学证据：范围综述

J Med Internet Res. 2025 May 15;27:e68998. doi: 10.2196/68998.

Advancing precision oncology with AI-powered genomic analysis.通过人工智能驱动的基因组分析推动精准肿瘤学发展。

Front Pharmacol. 2025 Apr 30;16:1591696. doi: 10.3389/fphar.2025.1591696. eCollection 2025.

A quantitative analysis of the use of anonymization in biomedical research.生物医学研究中匿名化使用情况的定量分析。

NPJ Digit Med. 2025 May 14;8(1):279. doi: 10.1038/s41746-025-01644-9.

The Use of Residual Blood Specimens in Seroprevalence Studies for Vaccine-Preventable Diseases: A Scoping Review.残余血标本在疫苗可预防疾病血清流行率研究中的应用：一项范围综述

Vaccines (Basel). 2025 Mar 18;13(3):321. doi: 10.3390/vaccines13030321.

本文引用的文献

De-Identification of Medical Narrative Data.医学叙事数据的去识别化

Stud Health Technol Inform. 2017;244:23-27.

Criminal Prohibition of Wrongful Re‑identification: Legal Solution or Minefield for Big Data?对不当重新识别的刑事禁止：法律解决方案还是大数据的雷区？

J Bioeth Inq. 2017 Dec;14(4):527-539. doi: 10.1007/s11673-017-9806-9. Epub 2017 Sep 14.

Patient Privacy in the Era of Big Data.大数据时代的患者隐私

Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.

A Semantic-Based K-Anonymity Scheme for Health Record Linkage.一种用于健康记录链接的基于语义的K匿名方案。

Stud Health Technol Inform. 2017;239:84-90.

Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data.考虑到数据脆弱性和多样性的可识别个人信息匿名化，以提高用户隐私和发布数据的实用性。

Sensors (Basel). 2017 May 8;17(5):1059. doi: 10.3390/s17051059.

Montreal Accord on Patient-Reported Outcomes (PROs) use series - Paper 9: anonymization and ethics considerations for capturing and sharing patient reported outcomes.《蒙特利尔患者报告结局（PROs）使用系列协议 - 论文9：患者报告结局的收集与共享中的匿名化及伦理考量》

J Clin Epidemiol. 2017 Sep;89:168-172. doi: 10.1016/j.jclinepi.2017.04.016. Epub 2017 Apr 20.

Preventing Unintended Disclosure of Personally Identifiable Data Following Anonymisation.防止匿名化后个人身份信息的意外泄露。

Stud Health Technol Inform. 2017;235:313-317.

De-identified genomic data sharing: the research participant perspective.去识别化基因组数据共享：研究参与者视角

J Community Genet. 2017 Jul;8(3):173-181. doi: 10.1007/s12687-017-0300-1. Epub 2017 Apr 5.

A Scalable and Pragmatic Method for the Safe Sharing of High-Quality Health Data.一种可扩展且实用的高质量健康数据安全共享方法。

IEEE J Biomed Health Inform. 2018 Mar;22(2):611-622. doi: 10.1109/JBHI.2017.2676880. Epub 2017 Mar 23.

Anonymization for outputs of population health and health services research conducted via an online data center.通过在线数据中心进行的人群健康与卫生服务研究产出的匿名化处理。

J Am Med Inform Assoc. 2017 May 1;24(3):544-549. doi: 10.1093/jamia/ocw152.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

生物医学文献中匿名化和去识别化的使用与理解：范围综述

Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献