Suppr超能文献

使用部分合成数据进行大规模健康调查的披露控制及其在癌症队列研究中的应用

Disclosure control using partially synthetic data for large-scale health surveys, with applications to CanCORS.

作者信息

Loong Bronwyn, Zaslavsky Alan M, He Yulei, Harrington David P

机构信息

Research School of Finance, Actuarial Studies and Applied Statistics, The Australian National University, Canberra, ACT 0200, Australia.

出版信息

Stat Med. 2013 Oct 30;32(24):4139-61. doi: 10.1002/sim.5841. Epub 2013 May 13.

Abstract

Statistical agencies have begun to partially synthesize public-use data for major surveys to protect the confidentiality of respondents' identities and sensitive attributes by replacing high disclosure risk and sensitive variables with multiple imputations. To date, there are few applications of synthetic data techniques to large-scale healthcare survey data. Here, we describe partial synthesis of survey data collected by the Cancer Care Outcomes Research and Surveillance (CanCORS) project, a comprehensive observational study of the experiences, treatments, and outcomes of patients with lung or colorectal cancer in the USA. We review inferential methods for partially synthetic data and discuss selection of high disclosure risk variables for synthesis, specification of imputation models, and identification disclosure risk assessment. We evaluate data utility by replicating published analyses and comparing results using original and synthetic data and discuss practical issues in preserving inferential conclusions. We found that important subgroup relationships must be included in the synthetic data imputation model, to preserve the data utility of the observed data for a given analysis procedure. We conclude that synthetic CanCORS data are suited best for preliminary data analyses purposes. These methods address the requirement to share data in clinical research without compromising confidentiality.

摘要

统计机构已开始对主要调查的公共使用数据进行部分合成,通过用多重插补替换高披露风险和敏感变量来保护受访者身份和敏感属性的机密性。迄今为止,合成数据技术在大规模医疗保健调查数据中的应用很少。在此,我们描述了癌症护理结果研究与监测(CanCORS)项目收集的调查数据的部分合成,该项目是对美国肺癌或结直肠癌患者的经历、治疗和结果进行的一项全面观察性研究。我们回顾了部分合成数据的推断方法,并讨论了合成高披露风险变量的选择、插补模型的设定以及识别披露风险评估。我们通过复制已发表的分析并使用原始数据和合成数据比较结果来评估数据效用,并讨论保留推断结论中的实际问题。我们发现,合成数据插补模型中必须包含重要的亚组关系,以保留给定分析程序中观察数据的数据效用。我们得出结论,合成的CanCORS数据最适合用于初步数据分析目的。这些方法满足了在不损害机密性的情况下共享临床研究数据的要求。

相似文献

2
Using spatiotemporal models to generate synthetic data for public use.使用时空模型生成供公众使用的合成数据。
Spat Spatiotemporal Epidemiol. 2018 Nov;27:37-45. doi: 10.1016/j.sste.2018.08.004. Epub 2018 Aug 31.
5
Multiple imputation in a large-scale complex survey: a practical guide.大规模复杂调查中的多重插补:实用指南。
Stat Methods Med Res. 2010 Dec;19(6):653-70. doi: 10.1177/0962280208101273. Epub 2009 Aug 4.
6
Communicating disclosure risk in informed consent statements.在知情同意声明中传达披露风险。
J Empir Res Hum Res Ethics. 2010 Sep;5(3):1-8. doi: 10.1525/jer.2010.5.3.1.

引用本文的文献

7
Synthetic data in health care: A narrative review.医疗保健中的合成数据:一篇叙述性综述。
PLOS Digit Health. 2023 Jan 6;2(1):e0000082. doi: 10.1371/journal.pdig.0000082. eCollection 2023 Jan.
10
Confidence interval estimation in R-DAS.R-DAS中的置信区间估计。
Drug Alcohol Depend. 2014 Oct 1;143:95-104. doi: 10.1016/j.drugalcdep.2014.07.017. Epub 2014 Aug 17.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验