Ruggles Steven, Van Riper David
Institute for Social Research and Data Innovation, University of Minnesota, Minneapolis, Minnesota, USA, 55455.
Popul Res Policy Rev. 2022 Jun;41(3):781-788. doi: 10.1007/s11113-021-09674-3. Epub 2021 Aug 22.
The Census Bureau plans a new approach to disclosure control for the 2020 census that will add noise to every statistic the agency produces for places below the state level. The Bureau argues the new approach is needed because the confidentiality of census responses is threatened by "database reconstruction," a technique for inferring individual-level responses from tabular data. The Census Bureau constructed hypothetical individual-level census responses from public 2010 tabular data and matched them to internal census records and to outside sources. The Census Bureau did not compare these results to a null model to demonstrate that their success in matching would not be expected by chance. This is analogous to conducting a clinical trial without a control group. We implement a simple simulation to assess how many matches would be expected by chance. We demonstrate that most matches reported by the Census Bureau experiment would be expected randomly. To extend the metaphor of the clinical trial, the treatment and the placebo produced similar outcomes. The database reconstruction experiment therefore fails to demonstrate a credible threat to confidentiality.
美国人口普查局计划对2020年人口普查采用一种新的披露控制方法,该方法将给该局为州以下地区提供的每项统计数据添加噪声。该局认为需要这种新方法,因为人口普查回复的保密性受到“数据库重建”的威胁,“数据库重建”是一种从表格数据推断个人层面回复的技术。美国人口普查局根据2010年公开的表格数据构建了假设的个人层面人口普查回复,并将其与内部人口普查记录及外部来源进行匹配。美国人口普查局没有将这些结果与零模型进行比较,以证明其匹配成功并非偶然。这类似于在没有对照组的情况下进行临床试验。我们进行了一个简单的模拟,以评估偶然会出现多少匹配情况。我们证明,人口普查局实验报告的大多数匹配情况是随机产生的。为了延伸临床试验的比喻,治疗组和安慰剂组产生了相似的结果。因此,数据库重建实验未能证明对保密性存在可信的威胁。