Gong Qiyuan, Luo Junzhou, Yang Ming, Ni Weiwei, Li Xiao-Bai
Southeast University, Nanjing, China.
University of Massachusetts Lowell, Massachusetts, USA.
Knowl Based Syst. 2017 Jan 1;115:15-26. doi: 10.1016/j.knosys.2016.10.012. Epub 2016 Oct 21.
Preserving privacy and utility during data publishing and data mining is essential for individuals, data providers and researchers. However, studies in this area typically assume that one individual has only one record in a dataset, which is unrealistic in many applications. Having multiple records for an individual leads to new privacy leakages. We call such a dataset a 1:M dataset. In this paper, we propose a novel privacy model called ()-diversity that addresses disclosure risks in 1:M data publishing. Based on this model, we develop an efficient algorithm named 1:M-Generalization to preserve privacy and data utility, and compare it with alternative approaches. Extensive experiments on real-world data show that our approach outperforms the state-of-the-art technique, in terms of data utility and computational cost.
在数据发布和数据挖掘过程中保护隐私和实用性,对个人、数据提供者和研究人员来说至关重要。然而,该领域的研究通常假定数据集中的一个人只有一条记录,这在许多应用中是不现实的。一个人拥有多条记录会导致新的隐私泄露。我们将这样的数据集称为1:M数据集。在本文中,我们提出了一种名为( )-多样性的新型隐私模型,用于解决1:M数据发布中的披露风险。基于此模型,我们开发了一种名为1:M-泛化的高效算法来保护隐私和数据实用性,并将其与其他方法进行比较。对真实世界数据的大量实验表明,我们的方法在数据实用性和计算成本方面优于现有技术。