Serra Edoardo, Vaidya Jaideep, Akella Haritha, Sharma Ashish
CS Department, Boise State University, USA.
MSIS Department, Rutgers University, USA.
ICT Syst Secur Priv Prot (2017). 2017 May;502:506-519. doi: 10.1007/978-3-319-58469-0_34. Epub 2017 May 4.
Frequent itemset mining is a fundamental data analytics task. In many cases, due to privacy concerns, only the frequent itemsets are released instead of the underlying data. However, it is not clear how to evaluate the privacy implications of the disclosure of the frequent item-sets. Towards this, in this paper, we define the k-distant-IFM-solutions problem, which aims to find k transaction datasets whose pair distance is maximized. The degree of difference between the reconstructed datasets provides a way to evaluate the privacy risk. Since the problem is NP-hard, we propose a 2-approximate solution as well as faster heuristics, and evaluate them on real data.
频繁项集挖掘是一项基本的数据分析任务。在许多情况下,出于隐私考虑,只发布频繁项集而不发布基础数据。然而,目前尚不清楚如何评估频繁项集披露对隐私的影响。为此,在本文中,我们定义了k-距离-IFM-解决方案问题,其目的是找到k个交易数据集,使它们之间的成对距离最大化。重建数据集之间的差异程度提供了一种评估隐私风险的方法。由于该问题是NP难问题,我们提出了一种2近似解以及更快的启发式算法,并在真实数据上对它们进行了评估。