一种管理加拿大小地理区域再识别风险的方法。

A method for managing re-identification risk from small geographic areas in Canada.

机构信息

Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, Ontario K1J 8L1, Canada.

出版信息

BMC Med Inform Decis Mak. 2010 Apr 2;10:18. doi: 10.1186/1472-6947-10-18.

DOI:10.1186/1472-6947-10-18

PMID:20361870

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2858714/

Abstract

BACKGROUND

A common disclosure control practice for health datasets is to identify small geographic areas and either suppress records from these small areas or aggregate them into larger ones. A recent study provided a method for deciding when an area is too small based on the uniqueness criterion. The uniqueness criterion stipulates that an the area is no longer too small when the proportion of unique individuals on the relevant variables (the quasi-identifiers) approaches zero. However, using a uniqueness value of zero is quite a stringent threshold, and is only suitable when the risks from data disclosure are quite high. Other uniqueness thresholds that have been proposed for health data are 5% and 20%.

METHODS

We estimated uniqueness for urban Forward Sortation Areas (FSAs) by using the 2001 long form Canadian census data representing 20% of the population. We then constructed two logistic regression models to predict when the uniqueness is greater than the 5% and 20% thresholds, and validated their predictive accuracy using 10-fold cross-validation. Predictor variables included the population size of the FSA and the maximum number of possible values on the quasi-identifiers (the number of equivalence classes).

RESULTS

All model parameters were significant and the models had very high prediction accuracy, with specificity above 0.9, and sensitivity at 0.87 and 0.74 for the 5% and 20% threshold models respectively. The application of the models was illustrated with an analysis of the Ontario newborn registry and an emergency department dataset. At the higher thresholds considerably fewer records compared to the 0% threshold would be considered to be in small areas and therefore undergo disclosure control actions. We have also included concrete guidance for data custodians in deciding which one of the three uniqueness thresholds to use (0%, 5%, 20%), depending on the mitigating controls that the data recipients have in place, the potential invasion of privacy if the data is disclosed, and the motives and capacity of the data recipient to re-identify the data.

CONCLUSION

The models we developed can be used to manage the re-identification risk from small geographic areas. Being able to choose among three possible thresholds, a data custodian can adjust the definition of "small geographic area" to the nature of the data and recipient.

摘要

背景

对于医疗数据集，一种常见的披露控制实践是标识小的地理区域，要么从这些小区域中删除记录，要么将它们汇总到更大的区域中。最近的一项研究提供了一种基于独特性标准来确定区域是否过小的方法。独特性标准规定，当相关变量（准标识符）上的唯一个体比例接近零时，该区域不再过小。然而，使用零的独特性值是一个相当严格的阈值，仅当数据披露的风险相当高时才适用。其他针对医疗数据提出的独特性阈值为 5%和 20%。

方法

我们使用代表 20%人口的 2001 年加拿大长式普查数据来估计城市 FSA 的独特性。然后，我们构建了两个逻辑回归模型来预测独特性何时大于 5%和 20%的阈值，并使用 10 折交叉验证来验证其预测准确性。预测变量包括 FSA 的人口规模和准标识符上可能的值的最大值（等价类的数量）。

结果

所有模型参数均具有统计学意义，并且模型具有非常高的预测准确性，特异性均高于 0.9，对于 5%和 20%阈值模型，敏感性分别为 0.87 和 0.74。通过对安大略省新生儿登记处和急诊科数据集的分析，说明了模型的应用。在较高的阈值下，与 0%阈值相比，将有相当少的记录被认为是在小区域内，因此将进行披露控制操作。我们还为数据保管人提供了具体的指导，以根据数据接收方已实施的减轻控制措施、数据披露可能侵犯的隐私、数据接收方重新识别数据的动机和能力，来决定使用三个独特性阈值之一（0%、5%、20%）。

结论

我们开发的模型可用于管理来自小地理区域的重新识别风险。数据保管人可以通过在三种可能的阈值之间进行选择，根据数据和接收方的性质来调整“小地理区域”的定义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d4c/2858714/3b22ce5e7c0a/1472-6947-10-18-1.jpg

相似文献

A method for managing re-identification risk from small geographic areas in Canada.

BMC Med Inform Decis Mak. 2010 Apr 2;10:18. doi: 10.1186/1472-6947-10-18.

Evaluating predictors of geographic area population size cut-offs to manage re-identification risk.

J Am Med Inform Assoc. 2009 Mar-Apr;16(2):256-66. doi: 10.1197/jamia.M2902. Epub 2008 Dec 11.

Area-level global and local clustering of human Salmonella Enteritidis infection rates in the city of Toronto, Canada, 2007-2009.

BMC Infect Dis. 2015 Aug 21;15:359. doi: 10.1186/s12879-015-1106-6.

Evaluating common de-identification heuristics for personal health information.

J Med Internet Res. 2006 Nov 21;8(4):e28. doi: 10.2196/jmir.8.4.e28.

The re-identification risk of Canadians from longitudinal demographics.

BMC Med Inform Decis Mak. 2011 Jun 22;11:46. doi: 10.1186/1472-6947-11-46.

Comprehensive analysis of cutaneous T-cell lymphoma (CTCL) incidence and mortality in Canada reveals changing trends and geographic clustering for this malignancy.

Cancer. 2017 Sep 15;123(18):3550-3567. doi: 10.1002/cncr.30758. Epub 2017 May 10.

Estimating the re-identification risk of clinical data sets.

BMC Med Inform Decis Mak. 2012 Jul 9;12:66. doi: 10.1186/1472-6947-12-66.

Examining variations in health within rural Canada.

Rural Remote Health. 2012;12:1848. Epub 2012 Feb 29.

First-generation immigrants and hospital admission rates for psychosis and affective disorders: an ecological study in Ontario.

Can J Psychiatry. 2011 Jul;56(7):418-26. doi: 10.1177/070674371105600705.

Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.

J Med Internet Res. 2020 Nov 16;22(11):e23139. doi: 10.2196/23139.

引用本文的文献

People with disability and privacy in precision medicine research: what's at stake?

Trends Genet. 2023 May;39(5):335-337. doi: 10.1016/j.tig.2023.01.001. Epub 2023 Jan 25.

Analysis of Geographic and Environmental Factors and Their Association with Cutaneous Melanoma Incidence in Canada.

Dermatology. 2022;238(6):1006-1017. doi: 10.1159/000524949. Epub 2022 Jun 9.

Confidentiality considerations for use of social-spatial data on the social determinants of health: Sexual and reproductive health case study.

Soc Sci Med. 2016 Oct;166:49-56. doi: 10.1016/j.socscimed.2016.08.009. Epub 2016 Aug 8.

Utility of linking primary care electronic medical records with Canadian census data to study the determinants of chronic disease: an example based on socioeconomic status and obesity.

BMC Med Inform Decis Mak. 2016 Mar 11;16:32. doi: 10.1186/s12911-016-0272-9.

Anonymisation of geographical distance matrices via Lipschitz embedding.

Int J Health Geogr. 2016 Jan 7;15:1. doi: 10.1186/s12942-015-0031-7.

Estimating the re-identification risk of clinical data sets.

BMC Med Inform Decis Mak. 2012 Jul 9;12:66. doi: 10.1186/1472-6947-12-66.

Understanding identifiability in secondary health data.

Can J Public Health. 2011 Jul-Aug;102(4):291-3. doi: 10.1007/BF03404051.

The re-identification risk of Canadians from longitudinal demographics.

BMC Med Inform Decis Mak. 2011 Jun 22;11:46. doi: 10.1186/1472-6947-11-46.

Methods for the de-identification of electronic health records for genomic research.

Genome Med. 2011 Apr 27;3(4):25. doi: 10.1186/gm239.

A secure protocol for protecting the identity of providers when disclosing data for disease surveillance.

J Am Med Inform Assoc. 2011 May 1;18(3):212-7. doi: 10.1136/amiajnl-2011-000100.

本文引用的文献

Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records.

Can J Hosp Pharm. 2009 Jul;62(4):307-19. doi: 10.4212/cjhp.v62i4.812.

Evaluating predictors of geographic area population size cut-offs to manage re-identification risk.

J Am Med Inform Assoc. 2009 Mar-Apr;16(2):256-66. doi: 10.1197/jamia.M2902. Epub 2008 Dec 11.

Evaluating common de-identification heuristics for personal health information.

J Med Internet Res. 2006 Nov 21;8(4):e28. doi: 10.2196/jmir.8.4.e28.

Method to assess identifiability in electronic data files.

Am J Epidemiol. 2007 Mar 1;165(5):597-601. doi: 10.1093/aje/kwk049. Epub 2006 Dec 20.

Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper.

J Am Med Inform Assoc. 2007 Jan-Feb;14(1):1-9. doi: 10.1197/jamia.M2273. Epub 2006 Oct 31.

Privacy protection versus cluster detection in spatial epidemiology.

Am J Public Health. 2006 Nov;96(11):2002-8. doi: 10.2105/AJPH.2005.069526. Epub 2006 Oct 3.

Confidentiality and confidence: is data aggregation a means to achieve both?

J Public Health Policy. 2005 Dec;26(4):430-49. doi: 10.1057/palgrave.jphp.3200029.

Accuracy of city postal code coordinates as a proxy for location of residence.

Int J Health Geogr. 2004 Mar 18;3(1):5. doi: 10.1186/1476-072X-3-5.

Towards evidence-based, GIS-driven national spatial health information infrastructure and surveillance services in the United Kingdom.

Int J Health Geogr. 2004 Jan 28;3(1):1. doi: 10.1186/1476-072X-3-1.

GIS and health care.

Annu Rev Public Health. 2003;24:25-42. doi: 10.1146/annurev.publhealth.24.012902.141012. Epub 2002 Oct 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种管理加拿大小地理区域再识别风险的方法。

A method for managing re-identification risk from small geographic areas in Canada.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献