• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Estimating Identification Disclosure Risk Using Mixed Membership Models.使用混合成员模型估计身份披露风险。
J Am Stat Assoc. 2012 Dec 1;107(500):1385-1394. doi: 10.1080/01621459.2012.710508.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Performance of the Grade of Membership Model Under a Variety of Sample Sizes, Group Size Ratios, and Differential Group Response Probabilities for Dichotomous Indicators.二分指标在各种样本量、组大小比率和不同组反应概率下的会员等级模型表现。
Educ Psychol Meas. 2021 Jun;81(3):523-548. doi: 10.1177/0013164420957384. Epub 2020 Sep 16.
4
A tutorial in assessing disclosure risk in microdata.评估微观数据中披露风险的教程。
Stat Med. 2018 Nov 10;37(25):3693-3706. doi: 10.1002/sim.7667. Epub 2018 Jun 21.
5
Post-randomization for controlling identification risk in releasing microdata from general surveys.随机化后用于控制在发布一般调查微观数据时的识别风险。
J Appl Stat. 2020 Feb 26;48(3):455-470. doi: 10.1080/02664763.2020.1732310. eCollection 2021.
6
[Meta-analysis of the Italian studies on short-term effects of air pollution].[意大利关于空气污染短期影响研究的荟萃分析]
Epidemiol Prev. 2001 Mar-Apr;25(2 Suppl):1-71.
7
A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses.一种基于二元响应的可识别隶属度等级分析的谱方法。
Psychometrika. 2024 Jun;89(2):626-657. doi: 10.1007/s11336-024-09951-y. Epub 2024 Feb 15.
8
Estimating the re-identification risk of clinical data sets.估算临床数据集的再识别风险。
BMC Med Inform Decis Mak. 2012 Jul 9;12:66. doi: 10.1186/1472-6947-12-66.
9
A method for managing re-identification risk from small geographic areas in Canada.一种管理加拿大小地理区域再识别风险的方法。
BMC Med Inform Decis Mak. 2010 Apr 2;10:18. doi: 10.1186/1472-6947-10-18.
10
Combining contingency tables with missing dimensions.
Biometrics. 2000 Jun;56(2):546-53. doi: 10.1111/j.0006-341x.2000.00546.x.

引用本文的文献

1
Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data.多元分类数据的维度分组混合成员模型
J Mach Learn Res. 2023 Feb;24.
2
Balancing Inferential Integrity and Disclosure Risk via Model Targeted Masking and Multiple Imputation.通过模型定向掩码和多重插补平衡推理完整性和披露风险。
J Am Stat Assoc. 2022;117(537):52-66. doi: 10.1080/01621459.2021.1909597. Epub 2021 May 4.
3
A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses.一种基于二元响应的可识别隶属度等级分析的谱方法。
Psychometrika. 2024 Jun;89(2):626-657. doi: 10.1007/s11336-024-09951-y. Epub 2024 Feb 15.
4
Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records.评估并降低源自医疗保健记录的研究数据中的重新识别风险。
EGEMS (Wash DC). 2019 Mar 29;7(1):6. doi: 10.5334/egems.270.
5
Imputation of confidential data sets with spatial locations using disease mapping models.使用疾病映射模型对具有空间位置的机密数据集进行插补。
Stat Med. 2014 May 20;33(11):1928-45. doi: 10.1002/sim.6078. Epub 2014 Jan 7.
6
Disclosure control using partially synthetic data for large-scale health surveys, with applications to CanCORS.使用部分合成数据进行大规模健康调查的披露控制及其在癌症队列研究中的应用
Stat Med. 2013 Oct 30;32(24):4139-61. doi: 10.1002/sim.5841. Epub 2013 May 13.
7
Protecting privacy of shared epidemiologic data without compromising analysis potential.在不影响分析潜力的情况下保护共享流行病学数据的隐私。
J Environ Public Health. 2012;2012:421989. doi: 10.1155/2012/421989. Epub 2012 Feb 2.

本文引用的文献

1
Mixed Membership Stochastic Blockmodels.混合成员随机块模型
J Mach Learn Res. 2008 Sep;9:1981-2014.
2
DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA.通过多变量二元数据的个体水平混合模型描述残疾情况。
Ann Appl Stat. 2007;1(2):346-384. doi: 10.1214/07-aoas126.
3
Population size estimation using individual level mixture models.使用个体水平混合模型进行种群规模估计。
Biom J. 2008 Dec;50(6):1051-63. doi: 10.1002/bimj.200810448.
4
Mixed-membership models of scientific publications.科学出版物的混合成员模型。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5220-7. doi: 10.1073/pnas.0307760101. Epub 2004 Mar 12.
5
Inference of population structure using multilocus genotype data.利用多位点基因型数据推断群体结构。
Genetics. 2000 Jun;155(2):945-59. doi: 10.1093/genetics/155.2.945.
6
Mathematical typology: a grade of membership technique for obtaining disease definition.数学类型学:一种用于获得疾病定义的隶属度技术。
Comput Biomed Res. 1978 Jun;11(3):277-98. doi: 10.1016/0010-4809(78)90012-5.

使用混合成员模型估计身份披露风险。

Estimating Identification Disclosure Risk Using Mixed Membership Models.

作者信息

Manrique-Vallier Daniel, Reiter Jerome P

机构信息

Postdoctoral Associate at the Social Science Research Institute and the Department of Statistical Science, Duke University, Durham, NC 27708-0251.

Mrs. Alexander Hehmeyer Associate Professor of Statistical Science, Duke University, Durham, NC 27708-0251.

出版信息

J Am Stat Assoc. 2012 Dec 1;107(500):1385-1394. doi: 10.1080/01621459.2012.710508.

DOI:10.1080/01621459.2012.710508
PMID:25214699
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4159106/
Abstract

Statistical agencies and other organizations that disseminate data are obligated to protect data subjects' confidentiality. For example, ill-intentioned individuals might link data subjects to records in other databases by matching on common characteristics (keys). Successful links are particularly problematic for data subjects with combinations of keys that are unique in the population. Hence, as part of their assessments of disclosure risks, many data stewards estimate the probabilities that sample uniques on sets of discrete keys are also population uniques on those keys. This is typically done using log-linear modeling on the keys. However, log-linear models can yield biased estimates of cell probabilities for sparse contingency tables with many zero counts, which often occurs in databases with many keys. This bias can result in unreliable estimates of probabilities of uniqueness and, hence, misrepresentations of disclosure risks. We propose an alternative to log-linear models for datasets with sparse keys based on a Bayesian version of grade of membership (GoM) models. We present a Bayesian GoM model for multinomial variables and offer an MCMC algorithm for fitting the model. We evaluate the approach by treating data from a recent US Census Bureau public use microdata sample as a population, taking simple random samples from that population, and benchmarking estimated probabilities of uniqueness against population values. Compared to log-linear models, GoM models provide more accurate estimates of the total number of uniques in the samples. Additionally, they offer record-level predictions of uniqueness that dominate those based on log-linear models.

摘要

负责发布数据的统计机构和其他组织有义务保护数据主体的隐私。例如,恶意个体可能会通过匹配共同特征(键)将数据主体与其他数据库中的记录关联起来。对于那些在总体中具有唯一键组合的数据主体来说,成功的关联尤其成问题。因此,作为其对披露风险评估的一部分,许多数据管理员会估计离散键集上的样本唯一值在总体中也是唯一值的概率。这通常是通过对键进行对数线性建模来完成的。然而,对于具有许多零计数的稀疏列联表,对数线性模型可能会产生单元概率的偏差估计,这种情况在具有许多键的数据库中经常出现。这种偏差可能导致唯一性概率的不可靠估计,从而错误地表示披露风险。我们针对具有稀疏键的数据集提出了一种基于贝叶斯隶属度等级(GoM)模型的对数线性模型替代方法。我们提出了一种用于多项变量的贝叶斯GoM模型,并提供了一种用于拟合该模型的MCMC算法。我们通过将美国人口普查局最近的公共使用微观数据样本中的数据视为总体,从该总体中进行简单随机抽样,并将估计的唯一性概率与总体值进行基准比较来评估该方法。与对数线性模型相比,GoM模型对样本中唯一值的总数提供了更准确的估计。此外,它们还提供了基于记录级别的唯一性预测,这些预测优于基于对数线性模型的预测。