用于健康数据发布的效用保持匿名化

Utility-preserving anonymization for health data publishing.

作者信息

Lee Hyukki, Kim Soohyung, Kim Jong Wook, Chung Yon Dohn

机构信息

Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea.

Department of IT Convegence, Korea University, Seoul, 145 Anam-ro, Seongbuk-gu, 02841, Republic of Korea.

出版信息

BMC Med Inform Decis Mak. 2017 Jul 11;17(1):104. doi: 10.1186/s12911-017-0499-0.

DOI:10.1186/s12911-017-0499-0

PMID:28693480

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5504813/

Abstract

BACKGROUND

Publishing raw electronic health records (EHRs) may be considered as a breach of the privacy of individuals because they usually contain sensitive information. A common practice for the privacy-preserving data publishing is to anonymize the data before publishing, and thus satisfy privacy models such as k-anonymity. Among various anonymization techniques, generalization is the most commonly used in medical/health data processing. Generalization inevitably causes information loss, and thus, various methods have been proposed to reduce information loss. However, existing generalization-based data anonymization methods cannot avoid excessive information loss and preserve data utility.

METHODS

We propose a utility-preserving anonymization for privacy preserving data publishing (PPDP). To preserve data utility, the proposed method comprises three parts: (1) utility-preserving model, (2) counterfeit record insertion, (3) catalog of the counterfeit records. We also propose an anonymization algorithm using the proposed method. Our anonymization algorithm applies full-domain generalization algorithm. We evaluate our method in comparison with existence method on two aspects, information loss measured through various quality metrics and error rate of analysis result.

RESULTS

With all different types of quality metrics, our proposed method show the lower information loss than the existing method. In the real-world EHRs analysis, analysis results show small portion of error between the anonymized data through the proposed method and original data.

CONCLUSIONS

We propose a new utility-preserving anonymization method and an anonymization algorithm using the proposed method. Through experiments on various datasets, we show that the utility of EHRs anonymized by the proposed method is significantly better than those anonymized by previous approaches.

摘要

背景

发布原始电子健康记录（EHRs）可能被视为侵犯个人隐私，因为它们通常包含敏感信息。隐私保护数据发布的一种常见做法是在发布前对数据进行匿名化处理，从而满足诸如k-匿名等隐私模型。在各种匿名化技术中，泛化是医疗/健康数据处理中最常用的方法。泛化不可避免地会导致信息丢失，因此，人们提出了各种方法来减少信息丢失。然而，现有的基于泛化的数据匿名化方法无法避免过多的信息丢失并保留数据效用。

方法

我们提出了一种用于隐私保护数据发布（PPDP）的效用保留匿名化方法。为了保留数据效用，该方法包括三个部分：（1）效用保留模型，（2）伪造记录插入，（3）伪造记录目录。我们还提出了一种使用该方法的匿名化算法。我们的匿名化算法应用全域泛化算法。我们在两个方面将我们的方法与现有方法进行比较评估，一是通过各种质量指标衡量的信息丢失，二是分析结果的错误率。

结果

在所有不同类型的质量指标下，我们提出的方法显示出比现有方法更低的信息丢失。在实际的电子健康记录分析中，分析结果表明，通过我们提出的方法匿名化的数据与原始数据之间的误差很小。

结论

我们提出了一种新的效用保留匿名化方法以及使用该方法的匿名化算法。通过在各种数据集上的实验，我们表明，通过我们提出的方法匿名化的电子健康记录的效用明显优于通过先前方法匿名化的记录。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ce6/5504813/2e3944c9ab8b/12911_2017_499_Fig1_HTML.jpg

相似文献

Utility-preserving anonymization for health data publishing.用于健康数据发布的效用保持匿名化

BMC Med Inform Decis Mak. 2017 Jul 11;17(1):104. doi: 10.1186/s12911-017-0499-0.

Privacy preserving data anonymization of spontaneous ADE reporting system dataset.自发不良药物事件报告系统数据集的隐私保护数据匿名化

BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):58. doi: 10.1186/s12911-016-0293-4.

Privacy-Preserving Anonymity for Periodical Releases of Spontaneous Adverse Drug Event Reporting Data: Algorithm Development and Validation.自发不良药物事件报告数据定期发布的隐私保护匿名性：算法开发与验证

JMIR Med Inform. 2021 Oct 28;9(10):e28752. doi: 10.2196/28752.

Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values.医学微观数据的差分隐私发布：一种保护信息属性值的高效实用方法。

BMC Med Inform Decis Mak. 2020 Jul 8;20(1):155. doi: 10.1186/s12911-020-01171-5.

Privacy-preserving data cube for electronic medical records: An experimental evaluation.用于电子病历的隐私保护数据立方体：实验评估

Int J Med Inform. 2017 Jan;97:33-42. doi: 10.1016/j.ijmedinf.2016.09.008. Epub 2016 Sep 24.

The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本：使用临床数据的案例研究

J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.

A framework to preserve the privacy of electronic health data streams.一种保护电子健康数据流隐私的框架。

J Biomed Inform. 2014 Aug;50:95-106. doi: 10.1016/j.jbi.2014.03.015. Epub 2014 Apr 4.

Utility-Preserving Anonymization in a Real-World Scenario: Evidence from the German Chronic Kidney Disease (GCKD) Study.实用匿名化在真实场景中的应用：来自德国慢性肾脏病（GCKD）研究的证据。

Stud Health Technol Inform. 2023 May 18;302:28-32. doi: 10.3233/SHTI230058.

Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints.在存在效用约束的情况下，对包含人口统计学和诊断代码的数据集进行匿名化处理。

J Biomed Inform. 2017 Jan;65:76-96. doi: 10.1016/j.jbi.2016.11.001. Epub 2016 Nov 8.

The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.质量成本：在信息损失最小化的情况下，对生物医学数据进行匿名化处理时实施泛化和抑制。

J Biomed Inform. 2015 Dec;58:37-48. doi: 10.1016/j.jbi.2015.09.007. Epub 2015 Sep 15.

引用本文的文献

The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本：使用临床数据的案例研究

J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.

Designing a Novel Approach Using a Greedy and Information-Theoretic Clustering-Based Algorithm for Anonymizing Microdata Sets.设计一种基于贪心和信息论聚类算法的新颖方法，用于对微数据集进行匿名化处理。

Entropy (Basel). 2023 Dec 1;25(12):1613. doi: 10.3390/e25121613.

Commercializing Personal Health Information: A Critical Qualitative Content Analysis of Documents Describing Proprietary Primary Care Databases in Canada.将个人健康信息商业化：对加拿大描述专有的初级保健数据库的文件进行批判性定性内容分析。

Int J Health Policy Manag. 2023;12:6938. doi: 10.34172/ijhpm.2023.6938. Epub 2023 May 2.

Algorithms to anonymize structured medical and healthcare data: A systematic review.使结构化医学和医疗保健数据匿名化的算法：一项系统综述。

Front Bioinform. 2022 Dec 22;2:984807. doi: 10.3389/fbinf.2022.984807. eCollection 2022.

A scalable software solution for anonymizing high-dimensional biomedical data.一种可扩展的软件解决方案，用于对高维生物医学数据进行匿名化处理。

Gigascience. 2021 Oct 4;10(10). doi: 10.1093/gigascience/giab068.

Improved privacy preserving method for periodical SRS publishing.周期性 SRS 发布的隐私保护改进方法。

PLoS One. 2021 Apr 22;16(4):e0250457. doi: 10.1371/journal.pone.0250457. eCollection 2021.

BMC Med Inform Decis Mak. 2020 Jul 8;20(1):155. doi: 10.1186/s12911-020-01171-5.

Privacy-preserving aggregation of personal health data streams.个人健康数据流的隐私保护聚合。

PLoS One. 2018 Nov 29;13(11):e0207639. doi: 10.1371/journal.pone.0207639. eCollection 2018.

本文引用的文献

A tamper-proof audit and control system for the doctor in the loop.一种针对参与其中的医生的防篡改审计与控制系统。

Brain Inform. 2016 Dec;3(4):269-279. doi: 10.1007/s40708-016-0046-2. Epub 2016 Mar 19.

Privacy preserving data anonymization of spontaneous ADE reporting system dataset.自发不良药物事件报告系统数据集的隐私保护数据匿名化

BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):58. doi: 10.1186/s12911-016-0293-4.

Efficient and effective pruning strategies for health data de-identification.用于健康数据去识别化的高效且有效的修剪策略。

BMC Med Inform Decis Mak. 2016 Apr 30;16:49. doi: 10.1186/s12911-016-0287-2.

J Biomed Inform. 2015 Dec;58:37-48. doi: 10.1016/j.jbi.2015.09.007. Epub 2015 Sep 15.

Knowledge Discovery and interactive Data Mining in Bioinformatics--State-of-the-Art, future challenges and research directions.生物信息学中的知识发现与交互式数据挖掘——现状、未来挑战及研究方向

BMC Bioinformatics. 2014;15 Suppl 6(Suppl 6):I1. doi: 10.1186/1471-2105-15-S6-I1. Epub 2014 May 16.

A globally optimal k-anonymity method for the de-identification of health data.一种用于健康数据去标识化的全局最优 k-匿名方法。

J Am Med Inform Assoc. 2009 Sep-Oct;16(5):670-82. doi: 10.1197/jamia.M3144. Epub 2009 Jun 30.

Family Educational Rights and Privacy Act (FERPA).《家庭教育权利与隐私法案》（FERPA）。

J Empir Res Hum Res Ethics. 2007 Mar;2(1):101. doi: 10.1525/jer.2007.2.1.101.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于健康数据发布的效用保持匿名化

Utility-preserving anonymization for health data publishing.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献