Suppr超能文献

公共使用数据中用于共享精确地理位置的多重填补法

MULTIPLE IMPUTATION FOR SHARING PRECISE GEOGRAPHIES IN PUBLIC USE DATA.

作者信息

Wang Hao, Reiter Jerome P

机构信息

Department of Statistics, University of South Carolina, Columbia, South Carolina 29208, USA.

出版信息

Ann Appl Stat. 2012 Mar 1;6(1):229-252. doi: 10.1214/11-AOAS506.

Abstract

When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.

摘要

在向公众发布数据时,数据管理员在道德上且通常在法律上有义务保护数据主体身份和敏感属性的机密性。他们还努力发布对广泛的二次分析有参考价值的数据。当数据管理员试图发布高分辨率地理信息时,要实现这两个目标尤其具有挑战性。我们提出了一种基于多重插补来保护带有地理标识符的数据机密性的方法。基本思想是将地理信息转换为经纬度,基于属性估计二元响应模型,并从这些模型中模拟新的经纬度值。我们使用描述北卡罗来纳州达勒姆市死因的数据来说明所提出的方法。在该应用背景下,我们展示了一个基于回归树生成模拟地理信息和属性的简单工具,并介绍了用此类模拟数据评估披露风险的方法。

相似文献

7
Estimating Identification Disclosure Risk Using Mixed Membership Models.使用混合成员模型估计身份披露风险。
J Am Stat Assoc. 2012 Dec 1;107(500):1385-1394. doi: 10.1080/01621459.2012.710508.
8
Using spatiotemporal models to generate synthetic data for public use.使用时空模型生成供公众使用的合成数据。
Spat Spatiotemporal Epidemiol. 2018 Nov;27:37-45. doi: 10.1016/j.sste.2018.08.004. Epub 2018 Aug 31.

引用本文的文献

本文引用的文献

1
Gaussian predictive process models for large spatial data sets.用于大型空间数据集的高斯预测过程模型。
J R Stat Soc Series B Stat Methodol. 2008 Sep 1;70(4):825-848. doi: 10.1111/j.1467-9868.2008.00663.x.
2
Confidentiality and spatially explicit data: concerns and challenges.保密性与空间明确数据:问题与挑战
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15337-42. doi: 10.1073/pnas.0507804102. Epub 2005 Oct 17.
3
Geographically masking health data to preserve confidentiality.对健康数据进行地理屏蔽以保护隐私。
Stat Med. 1999 Mar 15;18(5):497-525. doi: 10.1002/(sici)1097-0258(19990315)18:5<497::aid-sim45>3.0.co;2-#.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验