用于多中心研究的电子健康记录数据去识别和匿名化策略。

Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

机构信息

Stanford Sleep Medicine Center, Redwood City, CA 94063-5704, USA.

出版信息

Med Care. 2012 Jul;50 Suppl(Suppl):S82-101. doi: 10.1097/MLR.0b013e3182585355.

DOI:10.1097/MLR.0b013e3182585355

PMID:22692265

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6502465/

Abstract

BACKGROUND

De-identification and anonymization are strategies that are used to remove patient identifiers in electronic health record data. The use of these strategies in multicenter research studies is paramount in importance, given the need to share electronic health record data across multiple environments and institutions while safeguarding patient privacy.

METHODS

Systematic literature search using keywords of de-identify, deidentify, de-identification, deidentification, anonymize, anonymization, data scrubbing, and text scrubbing. Search was conducted up to June 30, 2011 and involved 6 different common literature databases. A total of 1798 prospective citations were identified, and 94 full-text articles met the criteria for review and the corresponding articles were obtained. Search results were supplemented by review of 26 additional full-text articles; a total of 120 full-text articles were reviewed.

RESULTS

A final sample of 45 articles met inclusion criteria for review and discussion. Articles were grouped into text, images, and biological sample categories. For text-based strategies, the approaches were segregated into heuristic, lexical, and pattern-based systems versus statistical learning-based systems. For images, approaches that de-identified photographic facial images and magnetic resonance image data were described. For biological samples, approaches that managed the identifiers linked with these samples were discussed, particularly with respect to meeting the anonymization requirements needed for Institutional Review Board exemption under the Common Rule.

CONCLUSIONS

Current de-identification strategies have their limitations, and statistical learning-based systems have distinct advantages over other approaches for the de-identification of free text. True anonymization is challenging, and further work is needed in the areas of de-identification of datasets and protection of genetic information.

摘要

背景

去识别和匿名化是用于去除电子健康记录数据中患者标识符的策略。鉴于需要在多个环境和机构之间共享电子健康记录数据，同时保护患者隐私，因此在多中心研究中使用这些策略至关重要。

方法

使用去识别、去标识、去识别、去标识、匿名化、匿名化、数据清洗和文本清洗等关键词进行系统文献检索。搜索截止日期为 2011 年 6 月 30 日，涉及 6 个不同的常用文献数据库。共确定了 1798 条前瞻性引用，有 94 篇全文文章符合审查标准，并获得了相应的文章。通过对 26 篇额外全文文章的回顾，补充了搜索结果；共审查了 120 篇全文文章。

结果

最终有 45 篇文章符合审查和讨论的纳入标准。文章分为文本、图像和生物样本类别。对于基于文本的策略，方法分为启发式、词汇和基于模式的系统与基于统计学习的系统。对于图像，描述了用于去识别摄影面部图像和磁共振图像数据的方法。对于生物样本，讨论了管理与这些样本相关联的标识符的方法，特别是在满足机构审查委员会豁免的常见规则下的匿名化要求方面。

结论

当前的去识别策略存在其局限性，基于统计学习的系统在去识别自由文本方面具有明显优于其他方法的优势。真正的匿名化具有挑战性，需要在数据集去识别和保护遗传信息方面进一步开展工作。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于多中心研究的电子健康记录数据去识别和匿名化策略。

Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

用于多中心研究的电子健康记录数据去识别和匿名化策略。

Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论