• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

乌干达一家医疗机构中就诊儿童队列的拟议去识别框架。

A proposed de-identification framework for a cohort of children presenting at a health facility in Uganda.

作者信息

Mawji Alishah, Longstaff Holly, Trawin Jessica, Dunsmuir Dustin, Komugisha Clare, Novakowski Stefanie K, Wiens Matthew O, Akech Samuel, Tagoola Abner, Kissoon Niranjan, Ansermino J Mark

机构信息

Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada.

Centre for International Child Health, BC Children's Hospital Research Institute, Vancouver, British Columbia, Canada.

出版信息

PLOS Digit Health. 2022 Aug 24;1(8):e0000027. doi: 10.1371/journal.pdig.0000027. eCollection 2022 Aug.

DOI:10.1371/journal.pdig.0000027
PMID:36812586
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9931294/
Abstract

Data sharing has enormous potential to accelerate and improve the accuracy of research, strengthen collaborations, and restore trust in the clinical research enterprise. Nevertheless, there remains reluctancy to openly share raw data sets, in part due to concerns regarding research participant confidentiality and privacy. Statistical data de-identification is an approach that can be used to preserve privacy and facilitate open data sharing. We have proposed a standardized framework for the de-identification of data generated from cohort studies in children in a low-and-middle income country. We applied a standardized de-identification framework to a data sets comprised of 241 health related variables collected from a cohort of 1750 children with acute infections from Jinja Regional Referral Hospital in Eastern Uganda. Variables were labeled as direct and quasi-identifiers based on conditions of replicability, distinguishability, and knowability with consensus from two independent evaluators. Direct identifiers were removed from the data sets, while a statistical risk-based de-identification approach using the k-anonymity model was applied to quasi-identifiers. Qualitative assessment of the level of privacy invasion associated with data set disclosure was used to determine an acceptable re-identification risk threshold, and corresponding k-anonymity requirement. A de-identification model using generalization, followed by suppression was applied using a logical stepwise approach to achieve k-anonymity. The utility of the de-identified data was demonstrated using a typical clinical regression example. The de-identified data sets was published on the Pediatric Sepsis Data CoLaboratory Dataverse which provides moderated data access. Researchers are faced with many challenges when providing access to clinical data. We provide a standardized de-identification framework that can be adapted and refined based on specific context and risks. This process will be combined with moderated access to foster coordination and collaboration in the clinical research community.

摘要

数据共享在加速和提高研究准确性、加强合作以及恢复对临床研究企业的信任方面具有巨大潜力。然而,公开共享原始数据集仍存在阻力,部分原因是担心研究参与者的保密性和隐私。统计数据去识别是一种可用于保护隐私并促进开放数据共享的方法。我们提出了一个标准化框架,用于对低收入和中等收入国家儿童队列研究产生的数据进行去识别。我们将一个标准化的去识别框架应用于一个数据集,该数据集由从乌干达东部金贾地区转诊医院的1750名急性感染儿童队列中收集的241个与健康相关的变量组成。根据可复制性、可区分性和可识别性条件,并经两名独立评估人员达成共识,将变量标记为直接标识符和准标识符。从数据集中删除直接标识符,同时对准标识符应用基于统计风险的去识别方法,即k匿名模型。通过对与数据集披露相关的隐私侵犯程度进行定性评估,以确定可接受的重新识别风险阈值和相应的k匿名要求。使用一种逻辑逐步方法应用一种先进行泛化然后抑制的去识别模型,以实现k匿名。通过一个典型的临床回归示例展示了去识别后数据的效用。去识别后的数据集发布在儿科脓毒症数据合作实验室数据存储库上,该存储库提供适度的数据访问。研究人员在提供临床数据访问时面临许多挑战。我们提供了一个标准化的去识别框架,该框架可以根据具体情况和风险进行调整和完善。这一过程将与适度访问相结合,以促进临床研究社区的协调与合作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d101/9931294/048454de2f9c/pdig.0000027.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d101/9931294/696e98e34d3f/pdig.0000027.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d101/9931294/8aa7e109dd29/pdig.0000027.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d101/9931294/048454de2f9c/pdig.0000027.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d101/9931294/696e98e34d3f/pdig.0000027.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d101/9931294/8aa7e109dd29/pdig.0000027.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d101/9931294/048454de2f9c/pdig.0000027.g003.jpg

相似文献

1
A proposed de-identification framework for a cohort of children presenting at a health facility in Uganda.乌干达一家医疗机构中就诊儿童队列的拟议去识别框架。
PLOS Digit Health. 2022 Aug 24;1(8):e0000027. doi: 10.1371/journal.pdig.0000027. eCollection 2022 Aug.
2
The project data sphere initiative: accelerating cancer research by sharing data.项目数据领域计划:通过数据共享加速癌症研究
Oncologist. 2015 May;20(5):464-e20. doi: 10.1634/theoncologist.2014-0431. Epub 2015 Apr 15.
3
Evaluating common de-identification heuristics for personal health information.评估个人健康信息的常见去识别启发式方法。
J Med Internet Res. 2006 Nov 21;8(4):e28. doi: 10.2196/jmir.8.4.e28.
4
Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models.在公共云计算环境中增强观察性医疗结局伙伴关系通用数据模型(OMOP-CDM)匿名性的去标识策略的提出与评估:使用隐私模型对医疗数据进行匿名化。
J Med Internet Res. 2020 Nov 26;22(11):e19597. doi: 10.2196/19597.
5
Optimizing annotation resources for natural language de-identification via a game theoretic framework.通过博弈论框架优化用于自然语言去识别的注释资源。
J Biomed Inform. 2016 Jun;61:97-109. doi: 10.1016/j.jbi.2016.03.019. Epub 2016 Mar 25.
6
A unified framework for evaluating the risk of re-identification of text de-identification tools.用于评估文本去识别工具重新识别风险的统一框架。
J Biomed Inform. 2016 Oct;63:174-183. doi: 10.1016/j.jbi.2016.07.015. Epub 2016 Jul 15.
7
Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization.开放获取健康和人口监测系统数据中研究参与者的隐私:数据匿名化的需求分析。
JMIR Public Health Surveill. 2022 Sep 2;8(9):e34472. doi: 10.2196/34472.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
A Semantic-Based K-Anonymity Scheme for Health Record Linkage.一种用于健康记录链接的基于语义的K匿名方案。
Stud Health Technol Inform. 2017;239:84-90.
10
Efficient and effective pruning strategies for health data de-identification.用于健康数据去识别化的高效且有效的修剪策略。
BMC Med Inform Decis Mak. 2016 Apr 30;16:49. doi: 10.1186/s12911-016-0287-2.

引用本文的文献

1
Sensitive Data Detection with High-Throughput Machine Learning Models in Electrical Health Records.基于电生理健康记录的高通量机器学习模型的敏感数据检测。
AMIA Annu Symp Proc. 2024 Jan 11;2023:814-823. eCollection 2023.
2
Algorithms to anonymize structured medical and healthcare data: A systematic review.使结构化医学和医疗保健数据匿名化的算法:一项系统综述。
Front Bioinform. 2022 Dec 22;2:984807. doi: 10.3389/fbinf.2022.984807. eCollection 2022.

本文引用的文献

1
Transparency of COVID-19 vaccine trials: decisions without data.新冠疫苗试验的透明度:缺乏数据的决策。
BMJ Evid Based Med. 2022 Aug;27(4):199-205. doi: 10.1136/bmjebm-2021-111735. Epub 2021 Aug 9.
2
Predictive Performance of Physiology-Based Pharmacokinetic Dose Estimates for Pediatric Trials: Evaluation With 10 Bayer Small-Molecule Compounds in Children.基于生理学的药代动力学剂量估计在儿科试验中的预测性能:对拜耳10种小分子化合物在儿童中的评估。
J Clin Pharmacol. 2021 Jun;61 Suppl 1(Suppl 1):S70-S82. doi: 10.1002/jcph.1869.
3
What drives and inhibits researchers to share and use open research data? A systematic literature review to analyze factors influencing open research data adoption.
是什么驱动和抑制研究人员共享和使用开放研究数据?一项系统文献综述分析影响开放研究数据采用的因素。
PLoS One. 2020 Sep 18;15(9):e0239283. doi: 10.1371/journal.pone.0239283. eCollection 2020.
4
Smart triage: triage and management of sepsis in children using the point-of-care Pediatric Rapid Sepsis Trigger (PRST) tool.智能分诊:使用即时儿科快速脓毒症触发工具(PRST)对儿童脓毒症进行分诊和管理。
BMC Health Serv Res. 2020 Jun 3;20(1):493. doi: 10.1186/s12913-020-05344-w.
5
Less than five is less than ideal: replacing the "less than 5 cell size" rule with a risk-based data disclosure protocol in a public health setting.少于五是不理想的:在公共卫生环境中,用基于风险的数据披露协议取代“小于 5 个细胞大小”的规则。
Can J Public Health. 2020 Oct;111(5):761-765. doi: 10.17269/s41997-020-00303-8. Epub 2020 Mar 11.
6
Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.生物医学文献中匿名化和去识别化的使用与理解:范围综述
J Med Internet Res. 2019 May 31;21(5):e13484. doi: 10.2196/13484.
7
Efficient and effective pruning strategies for health data de-identification.用于健康数据去识别化的高效且有效的修剪策略。
BMC Med Inform Decis Mak. 2016 Apr 30;16:49. doi: 10.1186/s12911-016-0287-2.
8
Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing.药物遗传学中的隐私:华法林个体化给药的端到端案例研究。
Proc USENIX Secur Symp. 2014 Aug;2014:17-32.
9
Who Owns the Data? Open Data for Healthcare.谁拥有数据?医疗保健领域的开放数据。
Front Public Health. 2016 Feb 17;4:7. doi: 10.3389/fpubh.2016.00007. eCollection 2016.
10
The what, why, and how of born-open data.关于生来即开放数据的是什么、为什么以及如何(的问题)。
Behav Res Methods. 2016 Sep;48(3):1062-9. doi: 10.3758/s13428-015-0630-z.