研究数据收集的化名处理：是否值得一试？

Pseudonymization for research data collection: is the juice worth the squeeze?

机构信息

Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.

出版信息

BMC Med Inform Decis Mak. 2019 Sep 4;19(1):178. doi: 10.1186/s12911-019-0905-x.

DOI:10.1186/s12911-019-0905-x

PMID:31484555

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6727563/

Abstract

BACKGROUND

The collection of data and biospecimens which characterize patients and probands in-depth is a core element of modern biomedical research. Relevant data must be considered highly sensitive and it needs to be protected from unauthorized use and re-identification. In this context, laws, regulations, guidelines and best-practices often recommend or mandate pseudonymization, which means that directly identifying data of subjects (e.g. names and addresses) is stored separately from data which is primarily needed for scientific analyses.

DISCUSSION

When (authorized) re-identification of subjects is not an exceptional but a common procedure, e.g. due to longitudinal data collection, implementing pseudonymization can significantly increase the complexity of software solutions. For example, data stored in distributed databases, need to be dynamically combined with each other, which requires additional interfaces for communicating between the various subsystems. This increased complexity may lead to new attack vectors for intruders. Obviously, this is in contrast to the objective of improving data protection. What is lacking is a standardized process of evaluating and reporting risks, threats and countermeasures, which can be used to test whether integrating pseudonymization methods into data collection systems actually improves upon the degree of protection provided by system designs that simply follow common IT security best practices and implement fine-grained role-based access control models. To demonstrate that the methods used to describe systems employing pseudonymized data management are currently heterogeneous and ad-hoc, we examined the extent to which twelve recent studies address each of the six basic security properties defined by the International Organization for Standardization (ISO) standard 27,000. We show inconsistencies across the studies, with most of them failing to mention one or more security properties.

CONCLUSION

We discuss the degree of privacy protection provided by implementing pseudonymization into research data collection processes. We conclude that (1) more research is needed on the interplay of pseudonymity, information security and data protection, (2) problem-specific guidelines for evaluating and reporting risks, threats and countermeasures should be developed and that (3) future work on pseudonymized research data collection should include the results of such structured and integrated analyses.

摘要

背景

深入收集和生物样本数据，以描绘患者和个体，是现代生物医学研究的核心要素。相关数据必须被视为高度敏感信息，并受到保护，防止未经授权的使用和重新识别。在这种情况下，法律、法规、准则和最佳实践通常建议或要求进行化名处理，这意味着直接识别受试者的数据（例如姓名和地址）与主要用于科学分析的数据分开存储。

讨论

当（授权）重新识别受试者不是例外情况，而是常见程序时，例如由于纵向数据收集，实施化名处理会显著增加软件解决方案的复杂性。例如，存储在分布式数据库中的数据需要彼此动态组合，这需要为各个子系统之间的通信提供额外接口。这种增加的复杂性可能会为入侵者带来新的攻击向量。显然，这与提高数据保护的目标背道而驰。所缺乏的是评估和报告风险、威胁和对策的标准化流程，这些流程可用于测试将化名处理方法集成到数据收集系统中是否实际上提高了仅遵循常见 IT 安全最佳实践并实施细粒度基于角色的访问控制模型的系统设计所提供的保护程度。为了证明用于描述使用化名数据管理的系统的方法目前是异构的和临时的，我们检查了十二项最近的研究在多大程度上解决了国际标准化组织（ISO）标准 27,000 定义的六个基本安全属性中的每一个。我们发现研究之间存在不一致，其中大多数研究没有提到一个或多个安全属性。

结论

我们讨论了在研究数据收集过程中实施化名处理提供的隐私保护程度。我们的结论是：（1）需要更多关于化名、信息安全和数据保护相互作用的研究；（2）应制定针对风险、威胁和对策评估和报告的特定问题指南；（3）未来关于化名研究数据收集的工作应包括此类结构化和集成分析的结果。

相似文献

Pseudonymization for research data collection: is the juice worth the squeeze?研究数据收集的化名处理：是否值得一试？

BMC Med Inform Decis Mak. 2019 Sep 4;19(1):178. doi: 10.1186/s12911-019-0905-x.

A generic solution for web-based management of pseudonymized data.一种用于基于网络的假名化数据管理的通用解决方案。

BMC Med Inform Decis Mak. 2015 Nov 30;15:100. doi: 10.1186/s12911-015-0222-y.

Pseudonymization of patient identifiers for translational research.患者标识符的化名用于转化研究。

BMC Med Inform Decis Mak. 2013 Jul 24;13:75. doi: 10.1186/1472-6947-13-75.

End-to-end pseudonymization of fine-tuned clinical BERT models : Privacy preservation with maintained data utility.端到端微调临床 BERT 模型的化名化：保持数据效用的隐私保护。

BMC Med Inform Decis Mak. 2024 Jun 12;24(1):162. doi: 10.1186/s12911-024-02546-8.

Improving patients privacy with Pseudonymization.通过假名化提高患者隐私。

Stud Health Technol Inform. 2008;136:691-6.

Layered Privacy Language Pseudonymization Extension for Health Care.医疗保健分层隐私语言假名化扩展

Stud Health Technol Inform. 2019 Aug 21;264:1189-1193. doi: 10.3233/SHTI190414.

Integration of Trusted Third Party Software into an EDC System for Data Protection - Compliant Identity Management, Consent Management and Pseudonymization in Medical Research Studies.将可信第三方软件集成到 EDC 系统中，以实现数据保护 - 符合法规要求的身份管理、同意管理和在医学研究中使用化名。

Stud Health Technol Inform. 2024 Aug 30;317:75-84. doi: 10.3233/SHTI240840.

Privacy-Preserving Record Grouping and Consent Management Based on a Public-Private Key Signature Scheme: Theoretical Analysis and Feasibility Study.基于公私钥签名方案的隐私保护记录分组与同意管理：理论分析与可行性研究

J Med Internet Res. 2019 Apr 12;21(4):e12300. doi: 10.2196/12300.

A methodology for the pseudonymization of medical data.一种医学数据化名化的方法。

Int J Med Inform. 2011 Mar;80(3):190-204. doi: 10.1016/j.ijmedinf.2010.10.016. Epub 2010 Nov 13.

A data protection framework for trans-European genetic research projects.泛欧基因研究项目的数据保护框架。

Stud Health Technol Inform. 2008;141:67-72.

引用本文的文献

Effect of osteopathic manipulative treatment on comorbid depressive symptoms in patients with chronic low back pain: study protocol for a randomised controlled trial.整骨手法治疗对慢性下腰痛患者共病抑郁症状的影响：一项随机对照试验的研究方案

BMJ Open. 2025 Jul 28;15(7):e094747. doi: 10.1136/bmjopen-2024-094747.

[FAIRification of real world data for health research].[用于健康研究的真实世界数据的公平化处理]

Pravent Gesundh. 2022 Sep 28:1-8. doi: 10.1007/s11553-022-00973-x.

A quantitative analysis of the use of anonymization in biomedical research.生物医学研究中匿名化使用情况的定量分析。

NPJ Digit Med. 2025 May 14;8(1):279. doi: 10.1038/s41746-025-01644-9.

Pseudonymization tools for medical research: a systematic review.医学研究中的假名化工具：系统评价

BMC Med Inform Decis Mak. 2025 Mar 12;25(1):128. doi: 10.1186/s12911-025-02958-0.

An innovative technological infrastructure for managing SARS-CoV-2 data across different cohorts in compliance with General Data Protection Regulation.一个创新的技术基础设施，用于根据《通用数据保护条例》管理不同队列中的SARS-CoV-2数据。

Digit Health. 2024 May 15;10:20552076241248922. doi: 10.1177/20552076241248922. eCollection 2024 Jan-Dec.

A Scalable Pseudonymization Tool for Rapid Deployment in Large Biomedical Research Networks: Development and Evaluation Study.一种可扩展的假名化工具，用于在大型生物医学研究网络中快速部署：开发与评估研究

JMIR Med Inform. 2024 Apr 23;12:e49646. doi: 10.2196/49646.

Development of a Trusted Third Party at a Large University Hospital: Design and Implementation Study.大型大学医院可信第三方的开发：设计与实施研究。

JMIR Med Inform. 2024 Apr 18;12:e53075. doi: 10.2196/53075.

Is there a civic duty to support medical AI development by sharing electronic health records?是否有公民责任通过共享电子健康记录来支持医疗 AI 的发展？

BMC Med Ethics. 2022 Dec 10;23(1):134. doi: 10.1186/s12910-022-00871-z.

Before and after enforcement of GDPR: Personal data protection requests received by Croatian Personal Data Protection Agency from academic and research institutions.《GDPR 实施前后：克罗地亚个人数据保护机构收到的来自学术和研究机构的个人数据保护请求》

Biochem Med (Zagreb). 2020 Oct 15;30(3):030201. doi: 10.11613/BM.2020.030201. Epub 2020 Aug 5.

本文引用的文献

A generic solution for web-based management of pseudonymized data.一种用于基于网络的假名化数据管理的通用解决方案。

BMC Med Inform Decis Mak. 2015 Nov 30;15:100. doi: 10.1186/s12911-015-0222-y.

Data breaches of protected health information in the United States.美国受保护健康信息的数据泄露事件。

JAMA. 2015 Apr 14;313(14):1471-3. doi: 10.1001/jama.2015.2252.

Secondary use of clinical data: the Vanderbilt approach.临床数据的二次利用：范德比尔特方法

J Biomed Inform. 2014 Dec;52:28-35. doi: 10.1016/j.jbi.2014.02.003. Epub 2014 Feb 14.

Pseudonymization of patient identifiers for translational research.患者标识符的化名用于转化研究。

BMC Med Inform Decis Mak. 2013 Jul 24;13:75. doi: 10.1186/1472-6947-13-75.

IT solutions for privacy protection in biobanking.生物样本库隐私保护的信息技术解决方案。

Public Health Genomics. 2012;15(5):254-62. doi: 10.1159/000336663. Epub 2012 Jun 20.

Managing sensitive phenotypic data and biomaterial in large-scale collaborative psychiatric genetic research projects: practical considerations.在大规模合作精神疾病遗传学研究项目中管理敏感表型数据和生物样本：实际考虑因素。

Mol Psychiatry. 2012 Dec;17(12):1180-5. doi: 10.1038/mp.2012.11. Epub 2012 Mar 6.

A methodology for the pseudonymization of medical data.一种医学数据化名化的方法。

Int J Med Inform. 2011 Mar;80(3):190-204. doi: 10.1016/j.ijmedinf.2010.10.016. Epub 2010 Nov 13.

The IT-infrastructure of a biobank for an academic medical center.一所学术医疗中心生物样本库的信息技术基础设施。

Stud Health Technol Inform. 2010;160(Pt 2):1334-8.

The disclosure of diagnosis codes can breach research participants' privacy.诊断编码的披露可能会侵犯研究参与者的隐私。

J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.

Securing a web-based teleradiology platform according to German law and "best practices".根据德国法律和“最佳实践”确保基于网络的远程放射学平台的安全。

Stud Health Technol Inform. 2009;150:730-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

研究数据收集的化名处理：是否值得一试？

Pseudonymization for research data collection: is the juice worth the squeeze?

机构信息

出版信息

BACKGROUND

DISCUSSION

CONCLUSION

背景

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献