• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估并降低源自医疗保健记录的研究数据中的重新识别风险。

Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records.

作者信息

Simon Gregory E, Shortreed Susan M, Coley R Yates, Penfold Robert B, Rossom Rebecca C, Waitzfelder Beth E, Sanchez Katherine, Lynch Frances L

机构信息

Kaiser Permanente Washington Health Research Institute, Seattle, WA, US.

HealthPartners Institute, Minneapolis, MN, US.

出版信息

EGEMS (Wash DC). 2019 Mar 29;7(1):6. doi: 10.5334/egems.270.

DOI:10.5334/egems.270
PMID:30972355
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6450246/
Abstract

BACKGROUND

Sharing of research data derived from health system records supports the rigor and reproducibility of primary research and can accelerate research progress through secondary use. But public sharing of such data can create risk of re-identifying individuals, exposing sensitive health information.

METHOD

We describe a framework for assessing re-identification risk that includes: identifying data elements in a research dataset that overlap with external data sources, identifying small classes of records defined by unique combinations of those data elements, and considering the pattern of population overlap between the research dataset and an external source. We also describe alternative strategies for mitigating risk when the external data source can or cannot be directly examined.

RESULTS

We illustrate this framework using the example of a large database used to develop and validate models predicting suicidal behavior after an outpatient visit. We identify elements in the research dataset that might create risk and propose a specific risk mitigation strategy: deleting indicators for health system (a proxy for state of residence) and visit year.

DISCUSSION

Researchers holding health system data must balance the public health value of data sharing against the duty to protect the privacy of health system members. Specific steps can provide a useful estimate of re-identification risk and point to effective risk mitigation strategies.

摘要

背景

共享源自卫生系统记录的研究数据有助于提高初级研究的严谨性和可重复性,并可通过二次利用加速研究进展。但此类数据的公开共享可能会带来重新识别个人身份、暴露敏感健康信息的风险。

方法

我们描述了一个评估重新识别风险的框架,该框架包括:识别研究数据集中与外部数据源重叠的数据元素,识别由这些数据元素的独特组合定义的小记录类别,以及考虑研究数据集与外部数据源之间的人群重叠模式。我们还描述了在外部数据源可直接检查或不可直接检查时降低风险的替代策略。

结果

我们以一个用于开发和验证预测门诊后自杀行为模型的大型数据库为例来说明这个框架。我们识别了研究数据集中可能产生风险的元素,并提出了一种具体的风险缓解策略:删除卫生系统指标(居住地状态的替代指标)和就诊年份。

讨论

持有卫生系统数据的研究人员必须在数据共享的公共卫生价值与保护卫生系统成员隐私的责任之间取得平衡。具体步骤可为重新识别风险提供有用的估计,并指向有效的风险缓解策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/f7f8b4d40fc4/egems-7-1-270-g5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/43d8f6b6bd70/egems-7-1-270-g1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/10a0bf5c9f20/egems-7-1-270-g2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/f244eeed27e4/egems-7-1-270-g3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/d785bc4280fc/egems-7-1-270-g4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/f7f8b4d40fc4/egems-7-1-270-g5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/43d8f6b6bd70/egems-7-1-270-g1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/10a0bf5c9f20/egems-7-1-270-g2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/f244eeed27e4/egems-7-1-270-g3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/d785bc4280fc/egems-7-1-270-g4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/f7f8b4d40fc4/egems-7-1-270-g5.jpg

相似文献

1
Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records.评估并降低源自医疗保健记录的研究数据中的重新识别风险。
EGEMS (Wash DC). 2019 Mar 29;7(1):6. doi: 10.5334/egems.270.
2
Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models.在公共云计算环境中增强观察性医疗结局伙伴关系通用数据模型(OMOP-CDM)匿名性的去标识策略的提出与评估:使用隐私模型对医疗数据进行匿名化。
J Med Internet Res. 2020 Nov 26;22(11):e19597. doi: 10.2196/19597.
3
A computational model to protect patient data from location-based re-identification.一种用于保护患者数据免遭基于位置的重新识别的计算模型。
Artif Intell Med. 2007 Jul;40(3):223-39. doi: 10.1016/j.artmed.2007.04.002. Epub 2007 Jun 1.
4
The project data sphere initiative: accelerating cancer research by sharing data.项目数据领域计划:通过数据共享加速癌症研究
Oncologist. 2015 May;20(5):464-e20. doi: 10.1634/theoncologist.2014-0431. Epub 2015 Apr 15.
5
[The risk of re-identification when analyzing electronic health records: a critical appraisal and possible solutions].分析电子健康记录时的再识别风险:批判性评估与可能的解决方案
Z Evid Fortbild Qual Gesundhwes. 2019 Dec;149:22-31. doi: 10.1016/j.zefq.2020.01.002. Epub 2020 Mar 10.
6
Ethical concerns on sharing genomic data including patients' family members.关于共享包括患者家庭成员在内的基因组数据的伦理问题。
BMC Med Ethics. 2018 Jun 18;19(1):61. doi: 10.1186/s12910-018-0310-5.
7
The disclosure of diagnosis codes can breach research participants' privacy.诊断编码的披露可能会侵犯研究参与者的隐私。
J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725.
8
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.
9
A database de-identification framework to enable direct queries on medical data for secondary use.一种用于实现对医疗数据进行直接查询以便二次使用的数据库去识别框架。
Methods Inf Med. 2012;51(3):229-41. doi: 10.3414/ME11-01-0048. Epub 2012 Feb 7.
10
A unified framework for evaluating the risk of re-identification of text de-identification tools.用于评估文本去识别工具重新识别风险的统一框架。
J Biomed Inform. 2016 Oct;63:174-183. doi: 10.1016/j.jbi.2016.07.015. Epub 2016 Jul 15.

引用本文的文献

1
Private commercial companies sharing health-relevant consumer data with health researchers in sub-Saharan Africa: an ethical exploration.撒哈拉以南非洲地区的私营商业公司与健康研究人员共享与健康相关的消费者数据:一项伦理探索。
Policy Stud. 2024 Sep 19. doi: 10.1080/01442872.2024.2403506.
2
Open data sharing: what could possibly go wrong?开放数据共享:可能会出什么问题?
Pain. 2025 Jul 1. doi: 10.1097/j.pain.0000000000003696.
3
A survey on UK researchers' views regarding their experiences with the de-identification, anonymisation, release methods and re-identification risk estimation for clinical trial datasets.

本文引用的文献

1
Predicting Suicide Attempts for Racial and Ethnic Groups of Patients During Routine Clinical Care.预测常规临床护理中患者的种族和民族群体的自杀企图。
Suicide Life Threat Behav. 2019 Jun;49(3):724-734. doi: 10.1111/sltb.12454. Epub 2018 Mar 24.
2
Predicting Suicidal Behavior From Longitudinal Electronic Health Records.从纵向电子健康记录预测自杀行为。
Am J Psychiatry. 2017 Feb 1;174(2):154-162. doi: 10.1176/appi.ajp.2016.16010077. Epub 2016 Sep 9.
3
Predicting suicides after outpatient mental health visits in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS).
一项关于英国研究人员对临床试验数据集的去识别化、匿名化、发布方法及重新识别风险评估经验的看法的调查。
Clin Trials. 2025 Feb;22(1):11-23. doi: 10.1177/17407745241259086. Epub 2024 Jun 19.
4
Generative Artificial Intelligence for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations: An ISPOR Working Group Report.用于卫生技术评估的生成式人工智能:机遇、挑战及政策考量:一份ISPOR工作组报告
Value Health. 2025 Feb;28(2):175-183. doi: 10.1016/j.jval.2024.10.3846. Epub 2024 Nov 12.
5
Using publicly available, interactive epidemiological dashboards: an innovative approach to sharing data from the Rakai Community Cohort Study.使用公开可用的交互式流行病学仪表盘:一种分享来自拉凯社区队列研究数据的创新方法。
JAMIA Open. 2024 Jul 23;7(3):ooae069. doi: 10.1093/jamiaopen/ooae069. eCollection 2024 Oct.
6
Development of a Longitudinal Prostate Cancer Transcriptomic and Clinical Data Linkage.开发纵向前列腺癌转录组学和临床数据链接。
JAMA Netw Open. 2024 Jun 3;7(6):e2417274. doi: 10.1001/jamanetworkopen.2024.17274.
7
Machine learning for healthcare that matters: Reorienting from technical novelty to equitable impact.关乎医疗保健的机器学习:从技术新奇转向公平影响的重新定位。
PLOS Digit Health. 2024 Apr 15;3(4):e0000474. doi: 10.1371/journal.pdig.0000474. eCollection 2024 Apr.
8
[Re-identification potential of structured health data].[结构化健康数据的重新识别潜力]
Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024 Feb;67(2):164-170. doi: 10.1007/s00103-023-03820-2. Epub 2024 Jan 17.
9
Privacy-preserving analysis of time-to-event data under nested case-control sampling.嵌套病例对照抽样下的生存数据分析的隐私保护。
Stat Methods Med Res. 2024 Jan;33(1):96-111. doi: 10.1177/09622802231215804. Epub 2023 Dec 13.
10
Health data hubs: an analysis of existing data governance features for research.健康数据中心:对现有数据治理功能用于研究的分析。
Health Res Policy Syst. 2023 Jul 10;21(1):70. doi: 10.1186/s12961-023-01026-1.
在陆军评估军人风险与恢复力研究(Army STARRS)中预测门诊心理健康就诊后的自杀情况。
Mol Psychiatry. 2017 Apr;22(4):544-551. doi: 10.1038/mp.2016.110. Epub 2016 Jul 19.
4
Risk of suicide attempt and suicide death following completion of the Patient Health Questionnaire depression module in community practice.社区实践中完成患者健康问卷抑郁模块后自杀未遂和自杀死亡的风险。
J Clin Psychiatry. 2016 Feb;77(2):221-7. doi: 10.4088/JCP.15m09776.
5
The HMO Research Network Virtual Data Warehouse: A Public Data Model to Support Collaboration.健康维护组织研究网络虚拟数据仓库:支持协作的公共数据模型。
EGEMS (Wash DC). 2014 Mar 24;2(1):1049. doi: 10.13063/2327-9214.1049. eCollection 2014.
6
Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study To Assess Risk and rEsilience in Servicemembers (Army STARRS).预测美国陆军士兵精神病院出院后的自杀率:军人研究评估风险和服役人员弹性(军人 STARRS)。
JAMA Psychiatry. 2015 Jan;72(1):49-57. doi: 10.1001/jamapsychiatry.2014.1754.
7
Estimating Identification Disclosure Risk Using Mixed Membership Models.使用混合成员模型估计身份披露风险。
J Am Stat Assoc. 2012 Dec 1;107(500):1385-1394. doi: 10.1080/01621459.2012.710508.
8
Evaluating the risk of patient re-identification from adverse drug event reports.评估从药物不良事件报告中重新识别患者的风险。
BMC Med Inform Decis Mak. 2013 Oct 5;13:114. doi: 10.1186/1472-6947-13-114.
9
Does response on the PHQ-9 Depression Questionnaire predict subsequent suicide attempt or suicide death?患者健康问卷-9抑郁量表的得分能否预测后续的自杀未遂或自杀死亡?
Psychiatr Serv. 2013 Dec 1;64(12):1195-202. doi: 10.1176/appi.ps.201200587.
10
Estimating the re-identification risk of clinical data sets.估算临床数据集的再识别风险。
BMC Med Inform Decis Mak. 2012 Jul 9;12:66. doi: 10.1186/1472-6947-12-66.