文献检索，用中文搜 PubMed

BACKGROUND

Data anonymization and sharing have become popular topics for individuals, organizations, and countries worldwide. Open-access sharing of anonymized data containing sensitive information about individuals makes the most sense whenever the utility of the data can be preserved and the risk of disclosure can be kept below acceptable levels. In this case, researchers can use the data without access restrictions and limitations.

OBJECTIVE

This study aimed to highlight the requirements and possible solutions for sharing health surveillance event history data. The challenges lie in the anonymization of multiple event dates and time-varying variables.

METHODS

A sequential approach that adds noise to event dates is proposed. This approach maintains the event order and preserves the average time between events. In addition, a nosy neighbor distance-based matching approach to estimate the risk is proposed. Regarding the key variables that change over time, such as educational level or occupation, we make 2 proposals: one based on limiting the intermediate statuses of the individual and the other to achieve k-anonymity in subsets of the data. The proposed approaches were applied to the Karonga health and demographic surveillance system (HDSS) core residency data set, which contains longitudinal data from 1995 to the end of 2016 and includes 280,381 events with time-varying socioeconomic variables and demographic information.

RESULTS

An anonymized version of the event history data, including longitudinal information on individuals over time, with high data utility, was created.

CONCLUSIONS

The proposed anonymization of event history data comprising static and time-varying variables applied to HDSS data led to acceptable disclosure risk, preserved utility, and being sharable as public use data. It was found that high utility was achieved, even with the highest level of noise added to the core event dates. The details are important to ensure consistency or credibility. Importantly, the sequential noise addition approach presented in this study does not only maintain the event order recorded in the original data but also maintains the time between events. We proposed an approach that preserves the data utility well but limits the number of response categories for the time-varying variables. Furthermore, using distance-based neighborhood matching, we simulated an attack under a nosy neighbor situation and by using a worst-case scenario where attackers have full information on the original data. We showed that the disclosure risk is very low, even when assuming that the attacker's database and information are optimal. The HDSS and medical science research communities in low- and middle-income country settings will be the primary beneficiaries of the results and methods presented in this paper; however, the results will be useful for anyone working on anonymizing longitudinal event history data with time-varying variables for the purposes of sharing.

BACKGROUND

OBJECTIVE

METHODS

RESULTS

An anonymized version of the event history data, including longitudinal information on individuals over time, with high data utility, was created.

CONCLUSIONS

背景

数据匿名化和共享已成为全球个人、组织和国家关注的热门话题。只要能够保留数据的效用，并将披露风险保持在可接受的水平以下，就可以公开共享包含个人敏感信息的匿名化数据。在这种情况下，研究人员可以在不受限制和限制的情况下使用这些数据。

目的

本研究旨在强调共享健康监测事件历史数据的要求和可能的解决方案。挑战在于对多个事件日期和时变变量进行匿名化。

方法

提出了一种向事件日期添加噪声的顺序方法。该方法保持事件顺序，并保留事件之间的平均时间。此外，还提出了一种基于嘈杂邻居距离的匹配方法来估计风险。对于随时间变化的关键变量，例如教育水平或职业，我们提出了两种解决方案：一种基于限制个人的中间状态，另一种在数据的子集上实现 k-匿名。所提出的方法应用于卡拉翁加健康和人口监测系统 (HDSS) 核心居住数据集，该数据集包含 1995 年底至 2016 年底的纵向数据，其中包含 280381 个具有时变社会经济变量和人口统计信息的事件。

结果

创建了包含随时间变化的个体纵向信息的事件历史数据的匿名版本，具有较高的数据效用。

结论

应用于 HDSS 数据的包含静态和时变变量的事件历史数据匿名化导致可接受的披露风险、保留的效用和可作为公共使用数据共享。结果表明，即使对核心事件日期添加了最高级别的噪声，也可以实现高效用。详细信息对于确保一致性或可信度很重要。重要的是，本研究中提出的顺序噪声添加方法不仅保持了原始数据中记录的事件顺序，而且还保持了事件之间的时间间隔。我们提出了一种方法，该方法很好地保留了数据效用，但限制了时变变量的响应类别数量。此外，使用基于距离的邻居匹配，我们模拟了在好奇邻居情况下的攻击，并使用攻击者对原始数据具有完整信息的最坏情况进行了模拟。结果表明，即使假设攻击者的数据库和信息是最佳的，披露风险也非常低。该结果和方法将主要使中低收入国家的 HDSS 和医学科学研究界受益，但对于任何旨在共享具有时变变量的纵向事件历史数据的人来说，该结果和方法都将是有用的。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

开放获取健康和人口监测系统数据中研究参与者的隐私：数据匿名化的需求分析。

Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

相似文献

引用本文的文献

本文引用的文献

开放获取健康和人口监测系统数据中研究参与者的隐私：数据匿名化的需求分析。

Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献