Suppr超能文献

平衡联邦统计系统中的数据隐私和可用性。

Balancing data privacy and usability in the federal statistical system.

机构信息

Department of Economics, Duke University, Durham, NC 27708.

Department of Economics, University of Kentucky, Lexington, KY 40503.

出版信息

Proc Natl Acad Sci U S A. 2022 Aug 2;119(31):e2104906119. doi: 10.1073/pnas.2104906119. Epub 2022 Jul 25.

Abstract

The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs.

摘要

联邦统计系统正面临着变革的压力。一方面,由于保密原因,许多目前由联邦机构持有的具有重要社会价值的数据要么根本无法提供给研究人员,要么只能在苛刻的条件下提供。另一方面,发布公共数据库的机构在保护这些数据库中主体隐私方面面临新的挑战,这导致他们考虑减少数据发布或采用降低数据准确性的方式进行屏蔽。在本文中,我们认为,讨论没有充分考虑到数据可用性的社会效益降低及其可用性与增加隐私保护水平的价值之间的权衡。应该使用更平衡的效益成本框架来评估这些权衡。我们对披露限制的合成数据方法以及差分隐私标准表示担忧,前者会降低以未知方式进行可靠研究的类型,后者则使用我们认为不适当的披露风险衡量标准。我们建议,用于评估所有披露保护方法的披露风险衡量标准应侧重于我们认为个人应该关注的风险,在广泛使用差分隐私标准和合成数据方法之前,应该对其对研究数据可用性的影响进行更多研究,并且应该对更好地平衡效益和成本的替代披露风险降低方法进行更多研究。

相似文献

1
Balancing data privacy and usability in the federal statistical system.
Proc Natl Acad Sci U S A. 2022 Aug 2;119(31):e2104906119. doi: 10.1073/pnas.2104906119. Epub 2022 Jul 25.
2
Privacy in confidential administrative micro data: implementing statistical disclosure control in a secure computing environment.
J Empir Res Hum Res Ethics. 2014 Dec;9(5):8-15. doi: 10.1177/1556264614552799. Epub 2014 Oct 2.
3
An in-depth examination of requirements for disclosure risk assessment.
Proc Natl Acad Sci U S A. 2023 Oct 24;120(43):e2220558120. doi: 10.1073/pnas.2220558120. Epub 2023 Oct 13.
4
Medical record confidentiality law, scientific research, and data collection in the information age.
J Law Med Ethics. 1997 Summer-Fall;25(2-3):113-29, 82. doi: 10.1111/j.1748-720x.1997.tb01887.x.
5
Driving toward guiding principles: a goal for privacy, confidentiality, and security of health information.
J Am Med Inform Assoc. 1999 Mar-Apr;6(2):122-33. doi: 10.1136/jamia.1999.0060122.
7
Legal issues concerning electronic health information: privacy, quality, and liability.
JAMA. 1999 Oct 20;282(15):1466-71. doi: 10.1001/jama.282.15.1466.
8
Privacy and confidentiality resources.
J Empir Res Hum Res Ethics. 2009 Sep;4(3):33-4. doi: 10.1525/jer.2009.4.3.33.
9
Health and the right to privacy.
Am J Law Med. 1999;25(2-3):193-201.
10
Health care information and privacy.
Health Matrix Clevel. 1998 Summer;8(2):223-32.

引用本文的文献

1
DataSHIELD: mitigating disclosure risk in a multi-site federated analysis platform.
Bioinform Adv. 2025 Mar 10;5(1):vbaf046. doi: 10.1093/bioadv/vbaf046. eCollection 2025.
2
When Privacy Protection Goes Wrong: How and Why the 2020 Census Confidentiality Program Failed.
J Econ Perspect. 2024 Spring;38(2):201-226. doi: 10.1257/jep.38.2.201.
3
Privacy violations in election results.
Sci Adv. 2025 Mar 14;11(11):eadt1512. doi: 10.1126/sciadv.adt1512. Epub 2025 Mar 12.
4
The shortcomings of synthetic census microdata.
Proc Natl Acad Sci U S A. 2025 Mar 18;122(11):e2424655122. doi: 10.1073/pnas.2424655122. Epub 2025 Mar 6.
5
Evaluating bias and noise induced by the U.S. Census Bureau's privacy protection methods.
Sci Adv. 2024 May 3;10(18):eadl2524. doi: 10.1126/sciadv.adl2524. Epub 2024 May 1.
6
Data-driven simulations for training AI-based segmentation of neutron images.
Sci Rep. 2024 Mar 19;14(1):6614. doi: 10.1038/s41598-024-56409-3.
7
The key role of absolute risk in the disclosure risk assessment of public data releases.
Proc Natl Acad Sci U S A. 2024 Mar 12;121(11):e2321882121. doi: 10.1073/pnas.2321882121. Epub 2024 Mar 5.
8
"It's None of Their Damn Business": Privacy and Disclosure Control in the U.S. Census, 1790-2020.
Popul Dev Rev. 2023 Sep;49(3):651-679. doi: 10.1111/padr.12580. Epub 2023 Jul 24.
9
An in-depth examination of requirements for disclosure risk assessment.
Proc Natl Acad Sci U S A. 2023 Oct 24;120(43):e2220558120. doi: 10.1073/pnas.2220558120. Epub 2023 Oct 13.
10
Confidence-ranked reconstruction of census records from aggregate statistics fails to capture privacy risks and reidentifiability.
Proc Natl Acad Sci U S A. 2023 May 2;120(18):e2303890120. doi: 10.1073/pnas.2303890120. Epub 2023 Apr 24.

本文引用的文献

1
The Role of Chance in the Census Bureau Database Reconstruction Experiment.
Popul Res Policy Rev. 2022 Jun;41(3):781-788. doi: 10.1007/s11113-021-09674-3. Epub 2021 Aug 22.
2
The use of differential privacy for census data and its impact on redistricting: The case of the 2020 U.S. Census.
Sci Adv. 2021 Oct 8;7(41):eabk3283. doi: 10.1126/sciadv.abk3283. Epub 2021 Oct 6.
3
How differential privacy will affect our understanding of health disparities in the United States.
Proc Natl Acad Sci U S A. 2020 Jun 16;117(24):13405-13412. doi: 10.1073/pnas.2003714117. Epub 2020 May 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验