Bi Xuan, Shen Xiaotong
Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, MN.
School of Statistics, University of Minnesota, Minneapolis, MN.
J Econom. 2023 Aug;235(2):444-453. doi: 10.1016/j.jeconom.2022.05.004. Epub 2022 Jun 18.
Differential privacy is becoming one gold standard for protecting the privacy of publicly shared data. It has been widely used in social science, data science, public health, information technology, and the U.S. decennial census. Nevertheless, to guarantee differential privacy, existing methods may unavoidably alter the conclusion of original data analysis, as privatization often changes the sample distribution. This phenomenon is known as the trade-off between privacy protection and statistical accuracy. In this work, we mitigate this trade-off by developing a distribution-invariant privatization (DIP) method to reconcile both high statistical accuracy and strict differential privacy. As a result, any downstream statistical or machine learning task yields essentially the same conclusion as if one used the original data. Numerically, under the same strictness of privacy protection, DIP achieves superior statistical accuracy in a wide range of simulation studies and real-world benchmarks.
差分隐私正成为保护公开共享数据隐私的一项黄金标准。它已在社会科学、数据科学、公共卫生、信息技术以及美国十年一度的人口普查中得到广泛应用。然而,为了保证差分隐私,现有方法可能不可避免地会改变原始数据分析的结论,因为数据私有化往往会改变样本分布。这种现象被称为隐私保护与统计准确性之间的权衡。在这项工作中,我们通过开发一种分布不变私有化(DIP)方法来缓解这种权衡,以兼顾高统计准确性和严格的差分隐私。结果是,任何下游的统计或机器学习任务得出的结论与使用原始数据时基本相同。在数值上,在相同的隐私保护严格程度下,DIP在广泛的模拟研究和实际基准测试中都实现了卓越的统计准确性。