诊断编码的披露可能会侵犯研究参与者的隐私。

The disclosure of diagnosis codes can breach research participants' privacy.

机构信息

Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee 37203, USA.

出版信息

J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.

DOI:10.1136/jamia.2009.002725

PMID:20442151

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2995712/

Abstract

OBJECTIVE

De-identified clinical data in standardized form (eg, diagnosis codes), derived from electronic medical records, are increasingly combined with research data (eg, DNA sequences) and disseminated to enable scientific investigations. This study examines whether released data can be linked with identified clinical records that are accessible via various resources to jeopardize patients' anonymity, and the ability of popular privacy protection methodologies to prevent such an attack.

DESIGN

The study experimentally evaluates the re-identification risk of a de-identified sample of Vanderbilt's patient records involved in a genome-wide association study. It also measures the level of protection from re-identification, and data utility, provided by suppression and generalization.

MEASUREMENT

Privacy protection is quantified using the probability of re-identifying a patient in a larger population through diagnosis codes. Data utility is measured at a dataset level, using the percentage of retained information, as well as its description, and at a patient level, using two metrics based on the difference between the distribution of Internal Classification of Disease (ICD) version 9 codes before and after applying privacy protection.

RESULTS

More than 96% of 2800 patients' records are shown to be uniquely identified by their diagnosis codes with respect to a population of 1.2 million patients. Generalization is shown to reduce further the percentage of de-identified records by less than 2%, and over 99% of the three-digit ICD-9 codes need to be suppressed to prevent re-identification.

CONCLUSIONS

Popular privacy protection methods are inadequate to deliver a sufficiently protected and useful result when sharing data derived from complex clinical systems. The development of alternative privacy protection models is thus required.

摘要

目的

从电子病历中提取的以标准化形式呈现的去标识化临床数据（例如诊断代码）越来越多地与研究数据（例如 DNA 序列）相结合，并进行传播，以支持科学研究。本研究考察了发布的数据是否可以与通过各种资源可访问的标识化临床记录相关联，从而危及患者的匿名性，以及流行的隐私保护方法是否能够防止此类攻击。

设计

本研究通过实验评估了参与全基因组关联研究的范德比尔特患者记录的去标识化样本的重新识别风险。它还衡量了抑制和泛化提供的重新识别保护和数据实用性的程度。

测量

隐私保护通过使用通过诊断代码在更大的人群中重新识别患者的概率进行量化。数据实用性在数据集级别上进行衡量，使用保留信息的百分比以及其描述进行衡量，在患者级别上，使用基于内部疾病分类（ICD）版本 9 代码在应用隐私保护前后分布之间差异的两个指标进行衡量。

结果

2800 名患者的记录中有超过 96%可以通过其诊断代码在 120 万患者的人群中唯一标识。泛化被证明可以进一步将去标识化记录的百分比降低不到 2%，并且需要抑制超过 99%的三位 ICD-9 代码才能防止重新识别。

结论

当共享来自复杂临床系统的数据时，流行的隐私保护方法不足以提供足够的保护和有用的结果。因此，需要开发替代的隐私保护模型。

相似文献

The disclosure of diagnosis codes can breach research participants' privacy.

J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.

SynTEG: a framework for temporal structured electronic health data simulation.

J Am Med Inform Assoc. 2021 Mar 1;28(3):596-604. doi: 10.1093/jamia/ocaa262.

Anonymization of administrative billing codes with repeated diagnoses through censoring.

AMIA Annu Symp Proc. 2010 Nov 13;2010:782-6.

Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints.

J Biomed Inform. 2017 Jan;65:76-96. doi: 10.1016/j.jbi.2016.11.001. Epub 2016 Nov 8.

Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule.

J Am Med Inform Assoc. 2011 Jan-Feb;18(1):3-10. doi: 10.1136/jamia.2010.004622.

Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.

J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23.

Reducing patient re-identification risk for laboratory results within research datasets.

J Am Med Inform Assoc. 2013 Jan 1;20(1):95-101. doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.

Participation in patient support forums may put rare disease patient data at risk of re-identification.

Orphanet J Rare Dis. 2020 Aug 31;15(1):226. doi: 10.1186/s13023-020-01497-3.

A computational model to protect patient data from location-based re-identification.

Artif Intell Med. 2007 Jul;40(3):223-39. doi: 10.1016/j.artmed.2007.04.002. Epub 2007 Jun 1.

A multi-institution evaluation of clinical profile anonymization.

J Am Med Inform Assoc. 2016 Apr;23(e1):e131-7. doi: 10.1093/jamia/ocv154. Epub 2015 Nov 13.

引用本文的文献

Multidimensional social signature de-anonymizes low-sensitivity data.

Sci Rep. 2025 Aug 29;15(1):31916. doi: 10.1038/s41598-025-16663-5.

Pseudonymisation of neuroimages and data protection: .

Neuroimage Rep. 2021 Sep 15;1(4):100053. doi: 10.1016/j.ynirp.2021.100053. eCollection 2021 Dec.

Privacy protection of sexually transmitted infections information from Chinese electronic medical records.

Sci Rep. 2025 Jan 8;15(1):1296. doi: 10.1038/s41598-024-84658-9.

Distributed non-disclosive validation of predictive models by a modified ROC-GLM.

BMC Med Res Methodol. 2024 Aug 29;24(1):190. doi: 10.1186/s12874-024-02312-4.

An Equity-Based Scoring System for Evaluating Surveillance-Related Harm in Public Health Crises.

Ethn Dis. 2023 Mar 31;33(1):63-75. doi: 10.18865/2022-2022. eCollection 2023 Jan.

Who owns (or controls) health data?

Sci Data. 2024 Feb 1;11(1):156. doi: 10.1038/s41597-024-02982-1.

[Re-identification potential of structured health data].

Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024 Feb;67(2):164-170. doi: 10.1007/s00103-023-03820-2. Epub 2024 Jan 17.

Report of the Medical Image De-Identification (MIDI) Task Group -- Best Practices and Recommendations.

ArXiv. 2025 Mar 16:arXiv:2303.10473v3.

Managing re-identification risks while providing access to the All of Us research program.

J Am Med Inform Assoc. 2023 Apr 19;30(5):907-914. doi: 10.1093/jamia/ocad021.

Algorithms to anonymize structured medical and healthcare data: A systematic review.

Front Bioinform. 2022 Dec 22;2:984807. doi: 10.3389/fbinf.2022.984807. eCollection 2022.

本文引用的文献

A cryptographic approach to securely share and query genomic sequences.

IEEE Trans Inf Technol Biomed. 2008 Sep;12(5):606-17. doi: 10.1109/TITB.2007.908465.

Confidentiality, privacy, and security of genetic and genomic test information in electronic health records: points to consider.

Genet Med. 2008 Jul;10(7):495-9. doi: 10.1097/gim.0b013e31817a8aaa.

Development of a large-scale de-identified DNA biobank to enable personalized medicine.

Clin Pharmacol Ther. 2008 Sep;84(3):362-9. doi: 10.1038/clpt.2008.89. Epub 2008 May 21.

The NCBI dbGaP database of genotypes and phenotypes.

Nat Genet. 2007 Oct;39(10):1181-6. doi: 10.1038/ng1007-1181.

A computational model to protect patient data from location-based re-identification.

Artif Intell Med. 2007 Jul;40(3):223-39. doi: 10.1016/j.artmed.2007.04.002. Epub 2007 Jun 1.

Rare visible disorders/ diseases as individually identifiable health information.

AMIA Annu Symp Proc. 2005;2005:947.

Protecting genomic sequence anonymity with generalization lattices.

Methods Inf Med. 2005;44(5):687-92.

A call for the creation of personalized medicine databases.

Nat Rev Drug Discov. 2006 Jan;5(1):23-6. doi: 10.1038/nrd1931.

Genetics. Genomic research and human subject privacy.

Science. 2004 Jul 9;305(5681):183. doi: 10.1126/science.1095019.

How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems.

J Biomed Inform. 2004 Jun;37(3):179-92. doi: 10.1016/j.jbi.2004.04.005.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

诊断编码的披露可能会侵犯研究参与者的隐私。

The disclosure of diagnosis codes can breach research participants' privacy.

机构信息

Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee 37203, USA.

出版信息

J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.

DOI:10.1136/jamia.2009.002725

PMID:20442151

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2995712/

Abstract

OBJECTIVE

DESIGN

MEASUREMENT

RESULTS

CONCLUSIONS

摘要

目的

设计

测量

结果

结论

当共享来自复杂临床系统的数据时，流行的隐私保护方法不足以提供足够的保护和有用的结果。因此，需要开发替代的隐私保护模型。

诊断编码的披露可能会侵犯研究参与者的隐私。

The disclosure of diagnosis codes can breach research participants' privacy.

机构信息

出版信息

OBJECTIVE

DESIGN

MEASUREMENT

RESULTS

CONCLUSIONS

目的

设计

测量

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

诊断编码的披露可能会侵犯研究参与者的隐私。

The disclosure of diagnosis codes can breach research participants' privacy.

机构信息

出版信息

OBJECTIVE

DESIGN

MEASUREMENT

RESULTS

CONCLUSIONS

目的

设计

测量

结果

结论