IEEE Trans Vis Comput Graph. 2021 Feb;27(2):1786-1796. doi: 10.1109/TVCG.2020.3030369. Epub 2021 Jan 28.
Differential Privacy is an emerging privacy model with increasing popularity in many domains. It functions by adding carefully calibrated noise to data that blurs information about individuals while preserving overall statistics about the population. Theoretically, it is possible to produce robust privacy-preserving visualizations by plotting differentially private data. However, noise-induced data perturbations can alter visual patterns and impact the utility of a private visualization. We still know little about the challenges and opportunities for visual data exploration and analysis using private visualizations. As a first step towards filling this gap, we conducted a crowdsourced experiment, measuring participants' performance under three levels of privacy (high, low, non-private) for combinations of eight analysis tasks and four visualization types (bar chart, pie chart, line chart, scatter plot). Our findings show that for participants' accuracy for summary tasks (e.g., find clusters in data) was higher that value tasks (e.g., retrieve a certain value). We also found that under DP, pie chart and line chart offer similar or better accuracy than bar chart. In this work, we contribute the results of our empirical study, investigating the task-based effectiveness of basic private visualizations, a dichotomous model for defining and measuring user success in performing visual analysis tasks under DP, and a set of distribution metrics for tuning the injection to improve the utility of private visualizations.
差分隐私是一种新兴的隐私模型,在许多领域越来越受欢迎。它通过向数据中添加精心校准的噪声来实现功能,这些噪声会模糊有关个人的信息,同时保留有关总体人口的总体统计信息。从理论上讲,可以通过绘制差分隐私数据来生成强大的隐私保护可视化效果。但是,噪声引起的数据干扰会改变视觉模式并影响私人可视化的实用性。我们对使用私人可视化进行视觉数据探索和分析的挑战和机遇仍然知之甚少。作为填补这一空白的第一步,我们进行了一项众包实验,针对八项分析任务和四种可视化类型(条形图、饼图、折线图、散点图)的组合,在高、低和非私有三个隐私级别下测量参与者的表现。我们的研究结果表明,对于参与者的摘要任务(例如,在数据中查找聚类)的准确性高于值任务(例如,检索特定值)。我们还发现,在 DP 下,饼图和折线图提供的准确性与条形图相似或更好。在这项工作中,我们贡献了我们的实证研究结果,该研究调查了基本私人可视化的基于任务的有效性,定义和衡量用户在 DP 下执行视觉分析任务成功的二分模型,以及一组用于调整注入以提高私人可视化实用性的分布指标。