Cheng Xiang, Su Sen, Xu Shengzhi, Xiong Li, Xiao Ke, Zhao Mingxing
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing China.
State Grid International Development Co., Ltd., Beijing China.
IEEE Trans Knowl Data Eng. 2018 Aug 1;30(8):1411-1425. doi: 10.1109/tkde.2018.2793862. Epub 2018 Jan 15.
Mining frequent subgraphs from a collection of input graphs is an important task for exploratory data analysis on graph data. However, if the input graphs contain sensitive information, releasing discovered frequent subgraphs may pose considerable threats to individual privacy. In this paper, we study the problem of frequent subgraph mining (FSM) under the rigorous differential privacy model. We present a two-phase differentially private FSM algorithm, which is referred to as . In , frequent subgraphs are privately identified in the first phase, and the noisy support of each identified frequent subgraph is calculated in the second phase. In particular, to privately identity frequent subgraphs, we propose a frequent subgraph identification approach, which can improve the accuracy of discovered frequent subgraphs through candidate pruning. Moreover, to compute the noisy support of each identified frequent subgraph, we devise a lattice-based noisy support computation approach, which leverages the inclusion relations between the discovered frequent subgraphs to improve the accuracy of the noisy supports. Through formal privacy analysis, we prove that satisfies -differential privacy. Extensive experimental results on real datasets show that can privately find frequent subgraphs while achieving high data utility.
从输入图集合中挖掘频繁子图是图数据探索性数据分析的一项重要任务。然而,如果输入图包含敏感信息,发布发现的频繁子图可能会对个人隐私构成相当大的威胁。在本文中,我们研究了严格差分隐私模型下的频繁子图挖掘(FSM)问题。我们提出了一种两阶段差分隐私FSM算法,称为 。在 中,频繁子图在第一阶段被私下识别,每个识别出的频繁子图的噪声支持度在第二阶段被计算。具体而言,为了私下识别频繁子图,我们提出了一种频繁子图识别方法,该方法可以通过候选剪枝提高发现的频繁子图的准确性。此外,为了计算每个识别出的频繁子图的噪声支持度,我们设计了一种基于格的噪声支持度计算方法,该方法利用发现的频繁子图之间的包含关系来提高噪声支持度的准确性。通过形式化隐私分析,我们证明 满足 -差分隐私。在真实数据集上的大量实验结果表明, 可以在实现高数据效用的同时私下找到频繁子图。