• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

开放获取健康和人口监测系统数据中研究参与者的隐私:数据匿名化的需求分析。

Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization.

机构信息

Institute of Data Analysis and Process Design, Zurich University of Applied Sciences, Winterthur, Switzerland.

Department of Population Health, London School of Hygiene and Tropical Medicine, Lilongwe, Malawi.

出版信息

JMIR Public Health Surveill. 2022 Sep 2;8(9):e34472. doi: 10.2196/34472.

DOI:10.2196/34472
PMID:36053573
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9482064/
Abstract

BACKGROUND

Data anonymization and sharing have become popular topics for individuals, organizations, and countries worldwide. Open-access sharing of anonymized data containing sensitive information about individuals makes the most sense whenever the utility of the data can be preserved and the risk of disclosure can be kept below acceptable levels. In this case, researchers can use the data without access restrictions and limitations.

OBJECTIVE

This study aimed to highlight the requirements and possible solutions for sharing health surveillance event history data. The challenges lie in the anonymization of multiple event dates and time-varying variables.

METHODS

A sequential approach that adds noise to event dates is proposed. This approach maintains the event order and preserves the average time between events. In addition, a nosy neighbor distance-based matching approach to estimate the risk is proposed. Regarding the key variables that change over time, such as educational level or occupation, we make 2 proposals: one based on limiting the intermediate statuses of the individual and the other to achieve k-anonymity in subsets of the data. The proposed approaches were applied to the Karonga health and demographic surveillance system (HDSS) core residency data set, which contains longitudinal data from 1995 to the end of 2016 and includes 280,381 events with time-varying socioeconomic variables and demographic information.

RESULTS

An anonymized version of the event history data, including longitudinal information on individuals over time, with high data utility, was created.

CONCLUSIONS

The proposed anonymization of event history data comprising static and time-varying variables applied to HDSS data led to acceptable disclosure risk, preserved utility, and being sharable as public use data. It was found that high utility was achieved, even with the highest level of noise added to the core event dates. The details are important to ensure consistency or credibility. Importantly, the sequential noise addition approach presented in this study does not only maintain the event order recorded in the original data but also maintains the time between events. We proposed an approach that preserves the data utility well but limits the number of response categories for the time-varying variables. Furthermore, using distance-based neighborhood matching, we simulated an attack under a nosy neighbor situation and by using a worst-case scenario where attackers have full information on the original data. We showed that the disclosure risk is very low, even when assuming that the attacker's database and information are optimal. The HDSS and medical science research communities in low- and middle-income country settings will be the primary beneficiaries of the results and methods presented in this paper; however, the results will be useful for anyone working on anonymizing longitudinal event history data with time-varying variables for the purposes of sharing.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/4915b0b04714/publichealth_v8i9e34472_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/239509e60e7a/publichealth_v8i9e34472_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/4be4ef74e0a7/publichealth_v8i9e34472_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/d37c83d05b78/publichealth_v8i9e34472_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/3c33598921c7/publichealth_v8i9e34472_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/4915b0b04714/publichealth_v8i9e34472_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/239509e60e7a/publichealth_v8i9e34472_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/4be4ef74e0a7/publichealth_v8i9e34472_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/d37c83d05b78/publichealth_v8i9e34472_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/3c33598921c7/publichealth_v8i9e34472_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d508/9482064/4915b0b04714/publichealth_v8i9e34472_fig5.jpg
摘要

背景

数据匿名化和共享已成为全球个人、组织和国家关注的热门话题。只要能够保留数据的效用,并将披露风险保持在可接受的水平以下,就可以公开共享包含个人敏感信息的匿名化数据。在这种情况下,研究人员可以在不受限制和限制的情况下使用这些数据。

目的

本研究旨在强调共享健康监测事件历史数据的要求和可能的解决方案。挑战在于对多个事件日期和时变变量进行匿名化。

方法

提出了一种向事件日期添加噪声的顺序方法。该方法保持事件顺序,并保留事件之间的平均时间。此外,还提出了一种基于嘈杂邻居距离的匹配方法来估计风险。对于随时间变化的关键变量,例如教育水平或职业,我们提出了两种解决方案:一种基于限制个人的中间状态,另一种在数据的子集上实现 k-匿名。所提出的方法应用于卡拉翁加健康和人口监测系统 (HDSS) 核心居住数据集,该数据集包含 1995 年底至 2016 年底的纵向数据,其中包含 280381 个具有时变社会经济变量和人口统计信息的事件。

结果

创建了包含随时间变化的个体纵向信息的事件历史数据的匿名版本,具有较高的数据效用。

结论

应用于 HDSS 数据的包含静态和时变变量的事件历史数据匿名化导致可接受的披露风险、保留的效用和可作为公共使用数据共享。结果表明,即使对核心事件日期添加了最高级别的噪声,也可以实现高效用。详细信息对于确保一致性或可信度很重要。重要的是,本研究中提出的顺序噪声添加方法不仅保持了原始数据中记录的事件顺序,而且还保持了事件之间的时间间隔。我们提出了一种方法,该方法很好地保留了数据效用,但限制了时变变量的响应类别数量。此外,使用基于距离的邻居匹配,我们模拟了在好奇邻居情况下的攻击,并使用攻击者对原始数据具有完整信息的最坏情况进行了模拟。结果表明,即使假设攻击者的数据库和信息是最佳的,披露风险也非常低。该结果和方法将主要使中低收入国家的 HDSS 和医学科学研究界受益,但对于任何旨在共享具有时变变量的纵向事件历史数据的人来说,该结果和方法都将是有用的。

相似文献

1
Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization.开放获取健康和人口监测系统数据中研究参与者的隐私:数据匿名化的需求分析。
JMIR Public Health Surveill. 2022 Sep 2;8(9):e34472. doi: 10.2196/34472.
2
Privacy preserving data anonymization of spontaneous ADE reporting system dataset.自发不良药物事件报告系统数据集的隐私保护数据匿名化
BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):58. doi: 10.1186/s12911-016-0293-4.
3
The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本:使用临床数据的案例研究
J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.
4
Utility-preserving anonymization for health data publishing.用于健康数据发布的效用保持匿名化
BMC Med Inform Decis Mak. 2017 Jul 11;17(1):104. doi: 10.1186/s12911-017-0499-0.
5
Utility-Preserving Anonymization in a Real-World Scenario: Evidence from the German Chronic Kidney Disease (GCKD) Study.实用匿名化在真实场景中的应用:来自德国慢性肾脏病(GCKD)研究的证据。
Stud Health Technol Inform. 2023 May 18;302:28-32. doi: 10.3233/SHTI230058.
6
Privacy-Preserving Anonymity for Periodical Releases of Spontaneous Adverse Drug Event Reporting Data: Algorithm Development and Validation.自发不良药物事件报告数据定期发布的隐私保护匿名性:算法开发与验证
JMIR Med Inform. 2021 Oct 28;9(10):e28752. doi: 10.2196/28752.
7
Diversity-Aware Anonymization for Structured Health Data.面向结构化健康数据的多样性感知匿名化。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2148-2154. doi: 10.1109/EMBC46164.2021.9629918.
8
Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints.在存在效用约束的情况下,对包含人口统计学和诊断代码的数据集进行匿名化处理。
J Biomed Inform. 2017 Jan;65:76-96. doi: 10.1016/j.jbi.2016.11.001. Epub 2016 Nov 8.
9
Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values.医学微观数据的差分隐私发布:一种保护信息属性值的高效实用方法。
BMC Med Inform Decis Mak. 2020 Jul 8;20(1):155. doi: 10.1186/s12911-020-01171-5.
10
Protecting Biomedical Data Against Attribute Disclosure.保护生物医学数据免受属性泄露。
Stud Health Technol Inform. 2019 Sep 3;267:207-214. doi: 10.3233/SHTI190829.

引用本文的文献

1
Algorithms to anonymize structured medical and healthcare data: A systematic review.使结构化医学和医疗保健数据匿名化的算法:一项系统综述。
Front Bioinform. 2022 Dec 22;2:984807. doi: 10.3389/fbinf.2022.984807. eCollection 2022.

本文引用的文献

1
The Impact of Medical Big Data Anonymization on Early Acute Kidney Injury Risk Prediction.医学大数据匿名化对早期急性肾损伤风险预测的影响
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:617-625. eCollection 2020.
2
Data sharing in PLOS ONE: An analysis of Data Availability Statements.PLOS ONE 数据共享:数据可获取性声明分析。
PLoS One. 2018 May 2;13(5):e0194768. doi: 10.1371/journal.pone.0194768. eCollection 2018.
3
A training manual for event history data management using Health and Demographic Surveillance System data.
一份使用健康与人口监测系统数据进行事件历史数据管理的培训手册。
BMC Res Notes. 2017 Jun 26;10(1):224. doi: 10.1186/s13104-017-2541-9.
4
Beyond open data: realising the health benefits of sharing data.超越开放数据:实现数据共享的健康效益。
BMJ. 2016 Oct 10;355:i5295. doi: 10.1136/bmj.i5295.
5
Hypertension and diabetes in Africa: design and implementation of a large population-based study of burden and risk factors in rural and urban Malawi.非洲的高血压与糖尿病:马拉维农村和城市基于人群的疾病负担及风险因素大型研究的设计与实施
Emerg Themes Epidemiol. 2016 Feb 1;13:3. doi: 10.1186/s12982-015-0039-2. eCollection 2016.
6
The INDEPTH Data Repository: An International Resource for Longitudinal Population and Health Data From Health and Demographic Surveillance Systems.深入数据储存库:健康与人口监测系统中纵向人口与健康数据的国际资源。
J Empir Res Hum Res Ethics. 2015 Jul;10(3):324-33. doi: 10.1177/1556264615594600.
7
Health and demographic surveillance systems: a step towards full civil registration and vital statistics system in sub-Sahara Africa?健康和人口监测系统:迈向撒哈拉以南非洲全面民事登记和生命统计系统的一步?
BMC Public Health. 2012 Sep 5;12:741. doi: 10.1186/1471-2458-12-741.
8
The INDEPTH Network: filling vital gaps in global epidemiology.深入网络:填补全球流行病学的重要空白。
Int J Epidemiol. 2012 Jun;41(3):579-88. doi: 10.1093/ije/dys081.
9
Profile: the Karonga Health and Demographic Surveillance System.简介:卡龙加健康和人口监测系统。
Int J Epidemiol. 2012 Jun;41(3):676-85. doi: 10.1093/ije/dys088. Epub 2012 Jun 22.
10
Sharing research data to improve public health.共享研究数据以改善公众健康。
Lancet. 2011 Feb 12;377(9765):537-9. doi: 10.1016/S0140-6736(10)62234-9. Epub 2011 Jan 7.