• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

降低研究数据集内实验室结果的患者再识别风险。

Reducing patient re-identification risk for laboratory results within research datasets.

机构信息

Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN 37232-8340, USA.

出版信息

J Am Med Inform Assoc. 2013 Jan 1;20(1):95-101. doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.

DOI:10.1136/amiajnl-2012-001026
PMID:22822040
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3555327/
Abstract

OBJECTIVE

To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.

MATERIALS AND METHODS

In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.

RESULTS

Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).

DISCUSSION AND CONCLUSION

With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.

摘要

目的

在最小化改变临床数据解读的同时,尝试降低包含实验室检测结果的生物医学研究数据库中患者再识别的风险。

材料与方法

在我们的威胁模型中,攻击者从一名患者中获取 5-7 项实验室结果,并将其用作搜索键,以在去标识化的生物医学研究数据库中发现相应的记录。为了测试我们的模型,使用了现有的范德比尔特 TIME 数据库,该数据库包含 61 280 名患者的 850 万份符合“安全港”原则的去标识化实验室结果。检查了数据集中原始实验室结果的独特性,然后应用了两种数据扰动模型——简单随机偏移和基于专家的临床意义保留模型。使用基于排名的再识别算法来模拟攻击。评估了每个模型的扰动实验室结果的再识别风险和临床意义保留情况。

结果

尽管改变的临床意义存在很大差异,但算法之间的再识别率差异很小。专家算法比简单扰动更好地保留了实验室结果的临床意义(影响多达 4%的测试结果),而简单扰动则影响多达 26%。

讨论与结论

随着为研究目的共享临床数据的动力不断增强,并且鉴于与医疗保健相关的联邦隐私法规,减轻再识别风险的方法非常重要。开发了一种实用的、基于专家的扰动算法,该算法具有潜在的应用价值。类似的方法可能使管理员能够选择数据保护方案参数,以在隐私保护和共享数据的临床意义保留之间的权衡中满足其偏好。

相似文献

1
Reducing patient re-identification risk for laboratory results within research datasets.降低研究数据集内实验室结果的患者再识别风险。
J Am Med Inform Assoc. 2013 Jan 1;20(1):95-101. doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.
2
Evaluating re-identification risks with respect to the HIPAA privacy rule.评估 HIPAA 隐私规则下的重新识别风险。
J Am Med Inform Assoc. 2010 Mar-Apr;17(2):169-77. doi: 10.1136/jamia.2009.000026.
3
Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.芝加哥一种隐私保护电子健康记录链接工具的设计与实现
J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23.
4
The disclosure of diagnosis codes can breach research participants' privacy.诊断编码的披露可能会侵犯研究参与者的隐私。
J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.
5
A computational model to protect patient data from location-based re-identification.一种用于保护患者数据免遭基于位置的重新识别的计算模型。
Artif Intell Med. 2007 Jul;40(3):223-39. doi: 10.1016/j.artmed.2007.04.002. Epub 2007 Jun 1.
6
The Importance of Context: Risk-based De-identification of Biomedical Data.背景的重要性:基于风险的生物医学数据去识别化
Methods Inf Med. 2016 Aug 5;55(4):347-55. doi: 10.3414/ME16-01-0012. Epub 2016 Jun 20.
7
De-identifying an EHR database - anonymity, correctness and readability of the medical record.对电子健康记录数据库进行去识别处理——医疗记录的匿名性、准确性和可读性。
Stud Health Technol Inform. 2011;169:862-6.
8
Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule.永远不要因为年龄而放弃匿名:通过 HIPAA 隐私规则共享人口统计数据的统计标准。
J Am Med Inform Assoc. 2011 Jan-Feb;18(1):3-10. doi: 10.1136/jamia.2010.004622.
9
Secure construction of k-unlinkable patient records from distributed providers.从分布式提供者那里构建 k 不可链接的患者记录的安全性。
Artif Intell Med. 2010 Jan;48(1):29-41. doi: 10.1016/j.artmed.2009.09.002. Epub 2009 Oct 28.
10
R-U policy frontiers for health data de-identification.健康数据去识别化的R-U政策前沿
J Am Med Inform Assoc. 2015 Sep;22(5):1029-41. doi: 10.1093/jamia/ocv004. Epub 2015 Apr 24.

引用本文的文献

1
Clinical Research Informatics: a Decade-in-Review.临床研究信息学:十年回顾
Yearb Med Inform. 2024 Aug;33(1):127-142. doi: 10.1055/s-0044-1800732. Epub 2025 Apr 8.
2
Reidentification of Participants in Shared Clinical Data Sets: Experimental Study.共享临床数据集参与者的重新识别:实验研究
JMIR AI. 2024 Mar 15;3:e52054. doi: 10.2196/52054.
3
Regulations and Norms for Reuse of Residual Clinical Biospecimens and Health Data.临床剩余生物标本和健康数据再利用的法规和规范。
West J Nurs Res. 2022 Nov;44(11):1068-1081. doi: 10.1177/01939459211029296. Epub 2021 Jul 8.
4
Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis.利用先进的隐私增强技术实现医学数据共享的革命:技术、法律和伦理综合。
J Med Internet Res. 2021 Feb 25;23(2):e25120. doi: 10.2196/25120.
5
Lost in Anonymization - A Data Anonymization Reference Classification Merging Legal and Technical Considerations.迷失在匿名化中——融合法律与技术考量的数据匿名化参考分类
J Law Med Ethics. 2020 Mar;48(1):228-231. doi: 10.1177/1073110520917025.
6
Regulating the Secondary Use of Data for Research: Arguments Against Genetic Exceptionalism.规范研究数据的二次使用:反对基因例外论的论据
Front Genet. 2019 Dec 20;10:1254. doi: 10.3389/fgene.2019.01254. eCollection 2019.
7
Detecting the Presence of an Individual in Phenotypic Summary Data.在表型汇总数据中检测个体的存在。
AMIA Annu Symp Proc. 2018 Dec 5;2018:760-769. eCollection 2018.
8
Points to consider for sharing variant-level information from clinical genetic testing with ClinVar.关于将临床基因检测的变异水平信息分享至ClinVar需考虑的要点。
Cold Spring Harb Mol Case Stud. 2018 Feb 1;4(1). doi: 10.1101/mcs.a002345. Print 2018 Feb.
9
Security controls in an integrated Biobank to protect privacy in data sharing: rationale and study design.综合生物样本库中保护数据共享隐私的安全控制措施:基本原理与研究设计。
BMC Med Inform Decis Mak. 2017 Jul 6;17(1):100. doi: 10.1186/s12911-017-0494-5.
10
Information technology for clinical, translational and comparative effectiveness research. Findings from the section clinical research informatics.用于临床、转化和比较效果研究的信息技术。临床研究信息学部分的研究结果。
Yearb Med Inform. 2014 Aug 15;9(1):224-7. doi: 10.15265/IY-2014-0040.

本文引用的文献

1
Anonymization of longitudinal electronic medical records.纵向电子病历的匿名化处理
IEEE Trans Inf Technol Biomed. 2012 May;16(3):413-23. doi: 10.1109/TITB.2012.2185850. Epub 2012 Jan 27.
2
Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study.利用多种电子病历系统在全基因组关联研究中识别 2 型糖尿病的遗传风险。
J Am Med Inform Assoc. 2012 Mar-Apr;19(2):212-8. doi: 10.1136/amiajnl-2011-000439. Epub 2011 Nov 19.
3
Identifiability in biobanks: models, measures, and mitigation strategies.生物库中的可识别性:模型、度量和缓解策略。
Hum Genet. 2011 Sep;130(3):383-92. doi: 10.1007/s00439-011-1042-5. Epub 2011 Jul 8.
4
Anonymization of administrative billing codes with repeated diagnoses through censoring.通过审查对具有重复诊断的行政计费代码进行匿名化处理。
AMIA Annu Symp Proc. 2010 Nov 13;2010:782-6.
5
The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.eMERGE 网络:一个由生物库组成的联盟,与电子病历数据相关联,用于进行基因组研究。
BMC Med Genomics. 2011 Jan 26;4:13. doi: 10.1186/1755-8794-4-13.
6
Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease.利用信息学进行遗传研究:利用电子病历进行外周动脉疾病的全基因组关联研究。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):568-74. doi: 10.1136/jamia.2010.004366.
7
Public and biobank participant attitudes toward genetic research participation and data sharing.公众和生物样本库参与者对基因研究参与和数据共享的态度。
Public Health Genomics. 2010;13(6):368-77. doi: 10.1159/000276767. Epub 2010 Jan 15.
8
The disclosure of diagnosis codes can breach research participants' privacy.诊断编码的披露可能会侵犯研究参与者的隐私。
J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.
9
Revisiting HIPAA.
Nurs Manage. 2010 Apr;41(4):34-9; quiz 39-40. doi: 10.1097/01.NUMA.0000370876.71090.03.
10
Focus on electronic health records. 'HIPAA2' legislation means more delicate handling of data.关注电子健康记录。《健康保险流通与责任法案2》立法意味着要更谨慎地处理数据。
Nat Med. 2010 Mar;16(3):250. doi: 10.1038/nm0310-250a.