Ding A Adam, Miao Guanhong, Wu Samuel S
Department of Mathematics, Northeastern University, Boston, MA.
Department of Biostatistics, University of Florida, Gainesville, FL.
J Priv Confid. 2020 Jun;10(2). doi: 10.29012/jpc.674.
Privacy protection is an important requirement in many statistical studies. A recently proposed data collection method, triple matrix-masking, retains exact summary statistics without exposing the raw data at any point in the process. In this paper, we provide theoretical formulation and proofs showing that a modified version of the procedure is strong collection obfuscating: no party in the data collection process is able to gain knowledge of the individual level data, even with some partially masked data information in addition to the publicly published data. This provides a theoretical foundation for the usage of such a procedure to collect masked data that allows exact statistical inference for linear models, while preserving a well-defined notion of privacy protection for each individual participant in the study. This paper fits into a line of work tackling the problem of how to create useful synthetic data without having a trustworthy data aggregator. We achieve this by splitting the trust between two parties, the "masking service provider" and the "data collector."
隐私保护是许多统计研究中的一项重要要求。最近提出的一种数据收集方法——三重矩阵掩码法,能够保留精确的汇总统计信息,且在过程中的任何时候都不会暴露原始数据。在本文中,我们提供了理论公式和证明,表明该程序的一个修改版本具有强大的收集混淆功能:数据收集过程中的任何一方都无法获取个体层面的数据,即使除了公开的数据之外还有一些部分掩码的数据信息。这为使用这种程序来收集掩码数据提供了理论基础,该掩码数据允许对线性模型进行精确的统计推断,同时为研究中的每个个体参与者保留明确的隐私保护概念。本文属于解决如何在没有可信赖的数据聚合器的情况下创建有用的合成数据这一问题的一系列工作。我们通过将信任分配给“掩码服务提供商”和“数据收集器”这两方来实现这一点。