Li Haoran, Jiang Xiaoqian, Xiong Li, Liu Jinfei
Emory University, Atlanta, GA.
University of California, San Diego, La Jolla, CA.
Proc ACM Int Conf Inf Knowl Manag. 2015 Oct;2015:1001-1010. doi: 10.1145/2806416.2806441.
Differential privacy has recently become a de facto standard for private statistical data release. Many algorithms have been proposed to generate differentially private histograms or synthetic data. However, most of them focus on "one-time" release of a static dataset and do not adequately address the increasing need of releasing series of dynamic datasets in real time. A straightforward application of existing histogram methods on each snapshot of such dynamic datasets will incur high accumulated error due to the composibility of differential privacy and correlations or overlapping users between the snapshots. In this paper, we address the problem of releasing series of dynamic datasets in real time with differential privacy, using a novel adaptive distance-based sampling approach. Our first method, DSFT, uses a fixed distance threshold and releases a differentially private histogram only when the current snapshot is sufficiently different from the previous one, i.e., with a distance greater than a predefined threshold. Our second method, DSAT, further improves DSFT and uses a dynamic threshold adaptively adjusted by a feedback control mechanism to capture the data dynamics. Extensive experiments on real and synthetic datasets demonstrate that our approach achieves better utility than baseline methods and existing state-of-the-art methods.
差分隐私最近已成为私有统计数据发布的事实上的标准。已经提出了许多算法来生成差分隐私直方图或合成数据。然而,它们中的大多数都专注于静态数据集的“一次性”发布,没有充分解决实时发布动态数据集系列的日益增长的需求。由于差分隐私的可组合性以及快照之间用户的相关性或重叠性,将现有直方图方法直接应用于此类动态数据集的每个快照会导致高累积误差。在本文中,我们使用一种新颖的基于自适应距离的采样方法来解决使用差分隐私实时发布动态数据集系列的问题。我们的第一种方法DSFT使用固定距离阈值,并且仅当当前快照与前一个快照有足够差异时,即距离大于预定义阈值时,才发布差分隐私直方图。我们的第二种方法DSAT进一步改进了DSFT,并使用由反馈控制机制自适应调整的动态阈值来捕捉数据动态。在真实和合成数据集上进行的大量实验表明,我们的方法比基线方法和现有的最先进方法具有更好的实用性。