• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用实验室检查医嘱评估患者重新识别情况及通过潜在空间变量进行缓解

Evaluation of patient re-identification using laboratory test orders and mitigation via latent space variables.

作者信息

Johnson Kipp W, De Freitas Jessica K, Glicksberg Benjamin S, Bobe Jason R, Dudley Joel T

机构信息

Institute for Next Generation Healthcare, Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, 770 Lexington Ave 15th Fl., New York, NY 10065, USA*Authors contributed equally.

出版信息

Pac Symp Biocomput. 2019;24:415-426.

PMID:30864342
Abstract

Anonymized electronic health records (EHR) are often used for biomedical research. One persistent concern with this type of research is the risk for re-identification of patients from their purportedly anonymized data. Here, we use the EHR of 731,850 de-identified patients to demonstrate that the average patient is unique from all others 98.4% of the time simply by examining what laboratory tests have been ordered for them. By the time a patient has visited the hospital on two separate days, they are unique in 72.3% of cases. We further present a computational study to identify how accurately the records from a single day of care can be used to re-identify patients from a set of 99 other patients. We show that, given a single visit's laboratory orders (even without result values) for a patient, we can re-identify the patient at least 25% of the time. Furthermore, we can place this patient among the top 10 most similar patients 47% of the time. Finally, we present a proof-of-concept technique using a variational autoencoder to encode laboratory results into a lower-dimensional latent space. We demonstrate that releasing latentspace encoded laboratory orders significantly improves privacy compared to releasing raw laboratory orders (<5% re-identification), while preserving information contained within the laboratory orders (AUC of >0.9 for recreating encoded values). Our findings have potential consequences for the public release of anonymized laboratory tests to the biomedical research community. We note that our findings do not imply that laboratory tests alone are personally identifiable. In the attack scenario presented here, reidentification would require a threat actor to possess an external source of laboratory values which are linked to personal identifiers at the start.

摘要

匿名电子健康记录(EHR)常用于生物医学研究。这类研究一直存在的一个担忧是,患者可能会从其所谓的匿名数据中被重新识别出来。在此,我们使用731850名去识别化患者的电子健康记录来证明,仅通过检查为患者安排了哪些实验室检查,平均而言,98.4%的情况下患者与其他所有患者都是唯一的。当患者在两天分别就诊时,在72.3%的病例中他们是唯一的。我们还进行了一项计算研究,以确定从一天的护理记录中能多准确地用于从另外99名患者的集合中重新识别患者。我们表明,给定一名患者一次就诊的实验室检查医嘱(即使没有结果值),我们至少在25%的时间里能够重新识别该患者。此外,我们在47%的时间里能够将该患者置于最相似的前10名患者之中。最后,我们提出了一种概念验证技术,使用变分自编码器将实验室检查结果编码到低维潜在空间中。我们证明,与发布原始实验室检查医嘱相比,发布潜在空间编码的实验室检查医嘱显著提高了隐私性(重新识别率<5%),同时保留了实验室检查医嘱中包含的信息(重新创建编码值的AUC>0.9)。我们的研究结果对向生物医学研究界公开发布匿名实验室检查结果具有潜在影响。我们注意到,我们的研究结果并不意味着仅实验室检查就能识别个人身份。在此呈现的攻击场景中,重新识别需要威胁行为者一开始就拥有与个人标识符相关联的外部实验室检查值来源。

相似文献

1
Evaluation of patient re-identification using laboratory test orders and mitigation via latent space variables.使用实验室检查医嘱评估患者重新识别情况及通过潜在空间变量进行缓解
Pac Symp Biocomput. 2019;24:415-426.
2
The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本:使用临床数据的案例研究
J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.
3
Challenges and Insights in Using HIPAA Privacy Rule for Clinical Text Annotation.使用《健康保险流通与责任法案》隐私规则进行临床文本注释的挑战与见解。
AMIA Annu Symp Proc. 2015 Nov 5;2015:707-16. eCollection 2015.
4
A unified framework for evaluating the risk of re-identification of text de-identification tools.用于评估文本去识别工具重新识别风险的统一框架。
J Biomed Inform. 2016 Oct;63:174-183. doi: 10.1016/j.jbi.2016.07.015. Epub 2016 Jul 15.
5
Patient Privacy in the Era of Big Data.大数据时代的患者隐私
Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.
6
Nonspecific deidentification of date-like text in deidentified clinical notes enables reidentification of dates.去识别化的临床记录中类似日期的非特定信息的去识别化处理可使日期被重新识别。
J Am Med Inform Assoc. 2022 Oct 7;29(11):1967-1971. doi: 10.1093/jamia/ocac147.
7
Criminal Prohibition of Wrongful Re‑identification: Legal Solution or Minefield for Big Data?对不当重新识别的刑事禁止:法律解决方案还是大数据的雷区?
J Bioeth Inq. 2017 Dec;14(4):527-539. doi: 10.1007/s11673-017-9806-9. Epub 2017 Sep 14.
8
Reducing patient re-identification risk for laboratory results within research datasets.降低研究数据集内实验室结果的患者再识别风险。
J Am Med Inform Assoc. 2013 Jan 1;20(1):95-101. doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.
9
The effect of defaults in an electronic health record on laboratory test ordering practices for pediatric patients.电子健康记录中的默认设置对儿科患者的实验室检测医嘱实践的影响。
Health Psychol. 2013 Sep;32(9):995-1002. doi: 10.1037/a0032925.
10
Generation of Surrogates for De-Identification of Electronic Health Records.用于电子健康记录去识别化的替代物生成
Stud Health Technol Inform. 2019 Aug 21;264:70-73. doi: 10.3233/SHTI190185.

引用本文的文献

1
[Re-identification potential of structured health data].[结构化健康数据的重新识别潜力]
Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024 Feb;67(2):164-170. doi: 10.1007/s00103-023-03820-2. Epub 2024 Jan 17.
2
When Biology Gets Personal: Hidden Challenges of Privacy and Ethics in Biological Big Data.当生物学涉及个人隐私:生物大数据中隐私与伦理的潜在挑战。
Pac Symp Biocomput. 2019;24:386-390.