• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

匿名化的成本:使用临床数据的案例研究

The Costs of Anonymization: Case Study Using Clinical Data.

作者信息

Pilgram Lisa, Meurers Thierry, Malin Bradley, Schaeffner Elke, Eckardt Kai-Uwe, Prasser Fabian

机构信息

Junior Digital Clinician Scientist Program, Biomedical Innovation Academy, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany.

Department of Nephrology and Medical Intensive Care, Charité-Universitätsmedizin Berlin, Berlin, Germany.

出版信息

J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.

DOI:10.2196/49445
PMID:38657232
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11079766/
Abstract

BACKGROUND

Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice.

OBJECTIVE

The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study.

METHODS

The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results.

RESULTS

Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy.

CONCLUSIONS

Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data.

TRIAL REGISTRATION

German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1093/ndt/gfr456.

摘要

背景

分享临床研究数据可加速科学进步、提高透明度,并增加创新与合作的潜力。然而,隐私问题仍是数据共享的障碍。某些问题,如重新识别风险,可通过应用匿名化算法来解决,即对数据进行修改,使其不再能合理地与个人相关联。然而,这种修改有可能影响数据集的统计特性,因此必须考虑隐私 - 实用性权衡。这在理论上已有研究,但基于现实世界个体层面临床数据的证据很少,并且匿名化在临床实践中尚未得到广泛应用。

目的

本研究的目的是通过使用德国慢性肾脏病(GCKD)研究的数据和科学结果,全面评估不同匿名化数据的隐私 - 实用性权衡,从而有助于更好地理解现实世界中的匿名化。

方法

本研究提取的GCKD数据集包含5217条记录和70个变量。采用两步程序来确定哪些变量构成重新识别风险。为涵盖大部分风险 - 实用性空间,我们确定了范围从0.02到1的风险阈值。然后通过泛化和抑制对数据进行转换,并使用通用配置和特定用例配置对匿名化过程进行变化。为评估匿名化GCKD数据的实用性,应用了通用指标(即数据粒度和熵)以及特定用例指标(即可重复性)。通过测量匿名化结果与原始结果之间95%置信区间长度的重叠来评估可重复性。

结果

通过95%置信区间重叠测量的可重复性高于从通用指标获得的实用性。例如,粒度在68.2%至87.6%之间变化,熵在25.5%至46.2%之间变化,而对于所有应用的风险阈值,平均95%置信区间重叠均高于90%。在所有分析的6个估计中检测到了不重叠的95%置信区间,但绝大多数估计的重叠超过50%。在相同隐私水平下,特定用例配置在实际实用性(即可重复性)方面优于通用配置。

结论

我们的结果说明了匿名化在旨在支持多种可能且可能相互竞争的用途时所面临的挑战,而特定用例的匿名化可提供更大的实用性。在评估匿名化数据的相关成本并试图为匿名化数据维持足够高的隐私水平时,应考虑这一方面。

试验注册

德国临床试验注册中心DRKS00003971;https://drks.de/search/en/trial/DRKS00003971。

国际注册报告标识符(IRRID):RR2 - 10.1093/ndt/gfr456。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/1b8f69be7a4c/jmir_v26i1e49445_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/32de90999546/jmir_v26i1e49445_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/9ae8edfc7ea1/jmir_v26i1e49445_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/fd6351398e92/jmir_v26i1e49445_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/3cb97d1fc9f1/jmir_v26i1e49445_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/90abeab5068f/jmir_v26i1e49445_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/1b8f69be7a4c/jmir_v26i1e49445_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/32de90999546/jmir_v26i1e49445_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/9ae8edfc7ea1/jmir_v26i1e49445_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/fd6351398e92/jmir_v26i1e49445_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/3cb97d1fc9f1/jmir_v26i1e49445_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/90abeab5068f/jmir_v26i1e49445_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/11079766/1b8f69be7a4c/jmir_v26i1e49445_fig6.jpg

相似文献

1
The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本:使用临床数据的案例研究
J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.
2
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
7
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
8
Sexual Harassment and Prevention Training性骚扰与预防培训
9
Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤:系统评价与经济学评估
Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.
10
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

引用本文的文献

1
Parallel privacy preservation through partitioning (P4): a scalable data anonymization algorithm for health data.通过分区实现并行隐私保护(P4):一种用于健康数据的可扩展数据匿名化算法。
BMC Med Inform Decis Mak. 2025 Mar 12;25(1):129. doi: 10.1186/s12911-025-02959-z.

本文引用的文献

1
Utility-Preserving Anonymization in a Real-World Scenario: Evidence from the German Chronic Kidney Disease (GCKD) Study.实用匿名化在真实场景中的应用:来自德国慢性肾脏病(GCKD)研究的证据。
Stud Health Technol Inform. 2023 May 18;302:28-32. doi: 10.3233/SHTI230058.
2
Managing re-identification risks while providing access to the All of Us research program.在提供对“所有人”研究计划访问权限的同时,管理重新识别风险。
J Am Med Inform Assoc. 2023 Apr 19;30(5):907-914. doi: 10.1093/jamia/ocad021.
3
Implementing clinical trial data sharing requires training a new generation of biomedical researchers.
实施临床试验数据共享需要培养新一代生物医学研究人员。
Nat Med. 2023 Feb;29(2):298-301. doi: 10.1038/s41591-022-02080-y.
4
Open tools for quantitative anonymization of tabular phenotype data: literature review.用于表格表型数据定量匿名化的开放工具:文献综述。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac440.
5
Challenges of data sharing in European Covid-19 projects: A learning opportunity for advancing pandemic preparedness and response.欧洲新冠疫情项目中的数据共享挑战:提升大流行防范与应对能力的学习契机。
Lancet Reg Health Eur. 2022 Oct;21:100467. doi: 10.1016/j.lanepe.2022.100467. Epub 2022 Aug 4.
6
Twenty Years of the Health Insurance Portability and Accountability Act Safe Harbor Provision: Unsolved Challenges and Ways Forward.《医疗保险可携性与责任法案》安全港条款二十年:未解挑战与前行之路
JMIR Med Inform. 2022 Aug 3;10(8):e37756. doi: 10.2196/37756.
7
Utility-driven assessment of anonymized data via clustering.基于聚类的匿名数据实用驱动评估。
Sci Data. 2022 Jul 30;9(1):456. doi: 10.1038/s41597-022-01561-6.
8
The German National Pandemic Cohort Network (NAPKON): rationale, study design and baseline characteristics.德国国家大流行队列网络(NAPKON):原理、研究设计和基线特征。
Eur J Epidemiol. 2022 Aug;37(8):849-870. doi: 10.1007/s10654-022-00896-z. Epub 2022 Jul 29.
9
De-identifying Socioeconomic Data at the Census Tract Level for Medical Research Through Constraint-based Clustering.通过基于约束的聚类对医疗研究进行人口普查区层面的社会经济数据去识别化。
AMIA Annu Symp Proc. 2022 Feb 21;2021:793-802. eCollection 2021.
10
Data-sharing practices in publications funded by the Canadian Institutes of Health Research: a descriptive analysis.加拿大卫生研究院资助的出版物中的数据共享实践:描述性分析。
CMAJ Open. 2021 Nov 9;9(4):E980-E987. doi: 10.9778/cmajo.20200303. Print 2021 Oct-Dec.