Suppr超能文献

评估并降低源自医疗保健记录的研究数据中的重新识别风险。

Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records.

作者信息

Simon Gregory E, Shortreed Susan M, Coley R Yates, Penfold Robert B, Rossom Rebecca C, Waitzfelder Beth E, Sanchez Katherine, Lynch Frances L

机构信息

Kaiser Permanente Washington Health Research Institute, Seattle, WA, US.

HealthPartners Institute, Minneapolis, MN, US.

出版信息

EGEMS (Wash DC). 2019 Mar 29;7(1):6. doi: 10.5334/egems.270.

Abstract

BACKGROUND

Sharing of research data derived from health system records supports the rigor and reproducibility of primary research and can accelerate research progress through secondary use. But public sharing of such data can create risk of re-identifying individuals, exposing sensitive health information.

METHOD

We describe a framework for assessing re-identification risk that includes: identifying data elements in a research dataset that overlap with external data sources, identifying small classes of records defined by unique combinations of those data elements, and considering the pattern of population overlap between the research dataset and an external source. We also describe alternative strategies for mitigating risk when the external data source can or cannot be directly examined.

RESULTS

We illustrate this framework using the example of a large database used to develop and validate models predicting suicidal behavior after an outpatient visit. We identify elements in the research dataset that might create risk and propose a specific risk mitigation strategy: deleting indicators for health system (a proxy for state of residence) and visit year.

DISCUSSION

Researchers holding health system data must balance the public health value of data sharing against the duty to protect the privacy of health system members. Specific steps can provide a useful estimate of re-identification risk and point to effective risk mitigation strategies.

摘要

背景

共享源自卫生系统记录的研究数据有助于提高初级研究的严谨性和可重复性,并可通过二次利用加速研究进展。但此类数据的公开共享可能会带来重新识别个人身份、暴露敏感健康信息的风险。

方法

我们描述了一个评估重新识别风险的框架,该框架包括:识别研究数据集中与外部数据源重叠的数据元素,识别由这些数据元素的独特组合定义的小记录类别,以及考虑研究数据集与外部数据源之间的人群重叠模式。我们还描述了在外部数据源可直接检查或不可直接检查时降低风险的替代策略。

结果

我们以一个用于开发和验证预测门诊后自杀行为模型的大型数据库为例来说明这个框架。我们识别了研究数据集中可能产生风险的元素,并提出了一种具体的风险缓解策略:删除卫生系统指标(居住地状态的替代指标)和就诊年份。

讨论

持有卫生系统数据的研究人员必须在数据共享的公共卫生价值与保护卫生系统成员隐私的责任之间取得平衡。具体步骤可为重新识别风险提供有用的估计,并指向有效的风险缓解策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809d/6450246/43d8f6b6bd70/egems-7-1-270-g1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验