Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea.
BMC Med Inform Decis Mak. 2020 Jul 8;20(1):155. doi: 10.1186/s12911-020-01171-5.
Various methods based on k-anonymity have been proposed for publishing medical data while preserving privacy. However, the k-anonymity property assumes that adversaries possess fixed background knowledge. Although differential privacy overcomes this limitation, it is specialized for aggregated results. Thus, it is difficult to obtain high-quality microdata. To address this issue, we propose a differentially private medical microdata release method featuring high utility.
We propose a method of anonymizing medical data under differential privacy. To improve data utility, especially by preserving informative attribute values, the proposed method adopts three data perturbation approaches: (1) generalization, (2) suppression, and (3) insertion. The proposed method produces an anonymized dataset that is nearly optimal with regard to utility, while preserving privacy.
The proposed method achieves lower information loss than existing methods. Based on a real-world case study, we prove that the results of data analyses using the original dataset and those obtained using a dataset anonymized via the proposed method are considerably similar.
We propose a novel differentially private anonymization method that preserves informative values for the release of medical data. Through experiments, we show that the utility of medical data that has been anonymized via the proposed method is significantly better than that of existing methods.
为了在发布医疗数据的同时保护隐私,已经提出了各种基于 k-匿名的方法。然而,k-匿名属性假设对手拥有固定的背景知识。尽管差分隐私克服了这一限制,但它是专门针对聚合结果的。因此,很难获得高质量的微观数据。针对这个问题,我们提出了一种具有高实用性的差分隐私医疗微观数据发布方法。
我们提出了一种在差分隐私下对医疗数据进行匿名化的方法。为了提高数据的实用性,特别是保留有信息量的属性值,所提出的方法采用了三种数据扰动方法:(1)泛化,(2)抑制,(3)插入。所提出的方法生成了一个在效用方面几乎是最优的匿名数据集,同时保护了隐私。
所提出的方法比现有方法实现了更低的信息损失。基于一个真实世界的案例研究,我们证明了使用原始数据集和使用通过所提出的方法匿名化的数据集进行数据分析的结果非常相似。
我们提出了一种新的差分隐私匿名化方法,用于发布医疗数据,保留有信息量的值。通过实验,我们表明通过所提出的方法匿名化的医疗数据的实用性明显优于现有方法。