隐私保护的异构健康数据共享。

Privacy-preserving heterogeneous health data sharing.

机构信息

Department of Computer Science and Software Engineering, Concordia University, Montreal, Quebec, Canada.

出版信息

J Am Med Inform Assoc. 2013 May 1;20(3):462-9. doi: 10.1136/amiajnl-2012-001027. Epub 2012 Dec 13.

DOI:10.1136/amiajnl-2012-001027

PMID:23242630

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3628047/

Abstract

OBJECTIVE

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data.

METHODS

The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy.

RESULTS

We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis.

LIMITATION

The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases.

CONCLUSIONS

Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.

摘要

目的

隐私保护数据发布旨在解决在挖掘有用信息时披露敏感数据的问题。在现有的隐私模型中，ε-差分隐私提供了最强的隐私保证之一，并且不假设对手的背景知识。所有现有的确保 ε-差分隐私的解决方案都分别以隐私保护的方式处理披露关系和集值数据的问题。在本文中，我们提出了一种算法，该算法在医疗保健数据的差分隐私披露中同时考虑了关系和集值数据。

方法

所提出的方法在差分隐私算法设计中进行了一个简单而基本的转变：不是列出所有可能的记录（即，列联表）以添加噪声，而是在添加噪声之前对记录进行泛化。该算法首先以概率方式对原始数据进行泛化，然后添加噪声以保证 ε-差分隐私。

结果

我们表明，所披露的数据可有效用于构建决策树归纳分类器。实验结果表明，该算法是可扩展的，并且在分类分析方面的性能优于现有解决方案。

局限性

当输出域的大小非常大时，产生的效用可能会降低，这使得为大型健康数据库生成合成数据可能不合适。

结论

与现有技术不同，所提出的算法允许以差分隐私的方式披露包含关系和集值数据的健康数据，并且可以保留用于判别分析的基本信息。

相似文献

Privacy-preserving heterogeneous health data sharing.

J Am Med Inform Assoc. 2013 May 1;20(3):462-9. doi: 10.1136/amiajnl-2012-001027. Epub 2012 Dec 13.

DPSynthesizer: Differentially Private Data Synthesizer for Privacy Preserving Data Sharing.

Proceedings VLDB Endowment. 2014 Aug;7(13):1677-1680. doi: 10.14778/2733004.2733059.

Differential privacy in health research: A scoping review.

J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135.

Insuring against the perils in distributed learning: privacy-preserving empirical risk minimization.

Math Biosci Eng. 2021 Mar 29;18(4):3006-3033. doi: 10.3934/mbe.2021151.

An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques.

Entropy (Basel). 2018 May 17;20(5):373. doi: 10.3390/e20050373.

Privacy-Preserving Search on Medical Data.

Stud Health Technol Inform. 2024 Aug 22;316:252-256. doi: 10.3233/SHTI240392.

Differentially private genome data dissemination through top-down specialization.

BMC Med Inform Decis Mak. 2014;14 Suppl 1(Suppl 1):S2. doi: 10.1186/1472-6947-14-S1-S2. Epub 2014 Dec 8.

A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.

Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.

PPSDT: A Novel Privacy-Preserving Single Decision Tree Algorithm for Clinical Decision-Support Systems Using IoT Devices.

Sensors (Basel). 2019 Jan 3;19(1):142. doi: 10.3390/s19010142.

Privacy-Preserving Hypothesis Testing for Reduced Cancer Risk on Daily Physical Activity.

J Med Syst. 2018 Apr 4;42(5):90. doi: 10.1007/s10916-018-0930-9.

引用本文的文献

Privacy-Enhancing Technologies in Biomedical Data Science.

Annu Rev Biomed Data Sci. 2024 Aug;7(1):317-343. doi: 10.1146/annurev-biodatasci-120423-120107.

A Novel Privacy Paradigm for Improving Serial Data Privacy.

Sensors (Basel). 2022 Apr 6;22(7):2811. doi: 10.3390/s22072811.

Differential privacy in health research: A scoping review.

J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135.

Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values.

BMC Med Inform Decis Mak. 2020 Jul 8;20(1):155. doi: 10.1186/s12911-020-01171-5.

Selecting Optimal Subset to release under Differentially Private M-estimators from Hybrid Datasets.

IEEE Trans Knowl Data Eng. 2018 Mar 1;30(3):573-584. doi: 10.1109/TKDE.2017.2773545. Epub 2017 Nov 14.

Are My EHRs Private Enough? Event-Level Privacy Protection.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):103-112. doi: 10.1109/TCBB.2018.2850037. Epub 2018 Jun 25.

Machine Learning and Decision Support in Critical Care.

Proc IEEE Inst Electr Electron Eng. 2016 Feb;104(2):444-466. doi: 10.1109/JPROC.2015.2501978. Epub 2016 Jan 25.

Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States.

Ann N Y Acad Sci. 2017 Jan;1387(1):73-83. doi: 10.1111/nyas.13259. Epub 2016 Sep 28.

A multi-institution evaluation of clinical profile anonymization.

J Am Med Inform Assoc. 2016 Apr;23(e1):e131-7. doi: 10.1093/jamia/ocv154. Epub 2015 Nov 13.

A Privacy Preservation Model for Health-Related Social Networking Sites.

J Med Internet Res. 2015 Jul 8;17(7):e168. doi: 10.2196/jmir.3973.

本文引用的文献

Beyond Safe Harbor: Automatic Discovery of Health Information De-identification Policy Alternatives.

IHI. 2010;2010:163-172. doi: 10.1145/1882992.1883017.

Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data.

J Am Med Inform Assoc. 2013 Jan 1;20(1):29-34. doi: 10.1136/amiajnl-2012-000936. Epub 2012 Jun 26.

Multiparameter Intelligent Monitoring in Intensive Care II: a public-access intensive care unit database.

Crit Care Med. 2011 May;39(5):952-60. doi: 10.1097/CCM.0b013e31820a92c6.

The disclosure of diagnosis codes can breach research participants' privacy.

J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.

HIPAA Possumus.

J Am Med Inform Assoc. 2003 May-Jun;10(3):294. doi: 10.1197/jamia.m1355.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

隐私保护的异构健康数据共享。

Privacy-preserving heterogeneous health data sharing.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

LIMITATION

CONCLUSIONS

目的

方法

结果

局限性

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献