一种用于在多重填补中纳入抽样权重的两步半参数方法。

A two-step semiparametric method to accommodate sampling weights in multiple imputation.

作者信息

Zhou Hanzhi, Elliott Michael R, Raghunathan Trviellore E

机构信息

Mathematics Policy Institute, Princeton, New Jersey, U.S.A.

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

出版信息

Biometrics. 2016 Mar;72(1):242-52. doi: 10.1111/biom.12413. Epub 2015 Sep 22.

DOI:10.1111/biom.12413

PMID:26393409

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6058975/

Abstract

Multiple imputation (MI) is a well-established method to handle item-nonresponse in sample surveys. Survey data obtained from complex sampling designs often involve features that include unequal probability of selection. MI requires imputation to be congenial, that is, for the imputations to come from a Bayesian predictive distribution and for the observed and complete data estimator to equal the posterior mean given the observed or complete data, and similarly for the observed and complete variance estimator to equal the posterior variance given the observed or complete data; more colloquially, the analyst and imputer make similar modeling assumptions. Yet multiply imputed data sets from complex sample designs with unequal sampling weights are typically imputed under simple random sampling assumptions and then analyzed using methods that account for the sampling weights. This is a setting in which the analyst assumes more than the imputer, which can led to biased estimates and anti-conservative inference. Less commonly used alternatives such as including case weights as predictors in the imputation model typically require interaction terms for more complex estimators such as regression coefficients, and can be vulnerable to model misspecification and difficult to implement. We develop a simple two-step MI framework that accounts for sampling weights using a weighted finite population Bayesian bootstrap method to validly impute the whole population (including item nonresponse) from the observed data. In the second step, having generated posterior predictive distributions of the entire population, we use standard IID imputation to handle the item nonresponse. Simulation results show that the proposed method has good frequentist properties and is robust to model misspecification compared to alternative approaches. We apply the proposed method to accommodate missing data in the Behavioral Risk Factor Surveillance System when estimating means and parameters of regression models.

摘要

多重填补（MI）是样本调查中处理项目无应答的一种成熟方法。从复杂抽样设计中获得的调查数据通常具有包括不等选择概率在内的特征。MI要求填补是适宜的，也就是说，填补值应来自贝叶斯预测分布，并且观测数据和完整数据估计量应等于给定观测数据或完整数据时的后验均值，同样，观测数据和完整方差估计量应等于给定观测数据或完整数据时的后验方差；通俗地说，分析师和填补者做出相似的建模假设。然而，来自具有不等抽样权重的复杂样本设计的多重填补数据集通常在简单随机抽样假设下进行填补，然后使用考虑抽样权重的方法进行分析。在这种情况下，分析师假设的比填补者多，这可能导致估计有偏差和反保守推断。不太常用的替代方法，如在填补模型中包含个案权重作为预测变量，通常需要为更复杂的估计量（如回归系数）设置交互项，并且可能容易受到模型误设的影响且难以实施。我们开发了一个简单的两步MI框架，该框架使用加权有限总体贝叶斯自助法考虑抽样权重，以从观测数据中有效地填补整个总体（包括项目无应答）。在第二步中，在生成了整个总体的后验预测分布后，我们使用标准独立同分布填补来处理项目无应答。模拟结果表明，与替代方法相比，所提出的方法具有良好的频率主义性质，并且对模型误设具有鲁棒性。我们将所提出的方法应用于行为风险因素监测系统中在估计回归模型的均值和参数时处理缺失数据的情况。

相似文献

A two-step semiparametric method to accommodate sampling weights in multiple imputation.

Biometrics. 2016 Mar;72(1):242-52. doi: 10.1111/biom.12413. Epub 2015 Sep 22.

Parametric and semiparametric model-based estimates of the finite population mean for two-stage cluster samples with item nonresponse.

Biometrics. 2007 Dec;63(4):1172-80. doi: 10.1111/j.1541-0420.2007.00816.x. Epub 2007 May 8.

A nonparametric multiple imputation approach for missing categorical data.

BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.

Multiple imputation by predictive mean matching in cluster-randomized trials.

BMC Med Res Methodol. 2020 Mar 30;20(1):72. doi: 10.1186/s12874-020-00948-6.

Multiple imputation for missing data via sequential regression trees.

Am J Epidemiol. 2010 Nov 1;172(9):1070-6. doi: 10.1093/aje/kwq260. Epub 2010 Sep 14.

Synthetic Multiple-Imputation Procedure for Multistage Complex Samples.

J Off Stat. 2016 Mar;32(1):231-256. doi: 10.1515/JOS-2016-0011. Epub 2016 Mar 10.

Cox regression analysis with missing covariates via nonparametric multiple imputation.

Stat Methods Med Res. 2019 Jun;28(6):1676-1688. doi: 10.1177/0962280218772592. Epub 2018 May 2.

Analysis of longitudinal clinical trials with missing data using multiple imputation in conjunction with robust regression.

Biometrics. 2012 Dec;68(4):1250-9. doi: 10.1111/j.1541-0420.2012.01780.x. Epub 2012 Sep 20.

Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap.

J Surv Stat Methodol. 2016 Jun 1;4(2):139-170. doi: 10.1093/jssam/smv031. Epub 2016 Jan 31.

Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models.

Stat Med. 2016 Jul 30;35(17):3007-20. doi: 10.1002/sim.6926. Epub 2016 Mar 7.

引用本文的文献

On the Use of Auxiliary Variables in Multilevel Regression and Poststratification.

Stat Sci. 2025 May;40(2):272-288. doi: 10.1214/24-sts932. Epub 2025 Jun 2.

Multiple imputation of missing data with skip-pattern covariates: a comparison of alternative strategies.

J Stat Comput Simul. 2023;94(7):1543-1570. doi: 10.1080/00949655.2023.2293124.

A SEMIPARAMETRIC MULTIPLE IMPUTATION APPROACH TO FULLY SYNTHETIC DATA FOR COMPLEX SURVEYS.

J Surv Stat Methodol. 2022 Jun;10(3):618-641. doi: 10.1093/jssam/smac016. Epub 2022 May 25.

Identifying dietary consumption patterns from survey data: a Bayesian nonparametric latent class model.

J R Stat Soc Ser A Stat Soc. 2023 Dec 12;187(2):496-512. doi: 10.1093/jrsssa/qnad135. eCollection 2024 Apr.

Bayesian estimation methods for survey data with potential applications to health disparities research.

Wiley Interdiscip Rev Comput Stat. 2024 Jan-Feb;16(1). doi: 10.1002/wics.1633. Epub 2023 Aug 28.

Multiple imputation with missing data indicators.

Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.

本文引用的文献

A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

Surv Methodol. 2014 Jun;40(1):29-46. Epub 2014 Jun 27.

Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling.

Surv Methodol. 2010 Jun;36(1):23-34. Epub 2010 Jun 29.

Combining multiple imputation and inverse-probability weighting.

Biometrics. 2012 Mar;68(1):129-37. doi: 10.1111/j.1541-0420.2011.01666.x. Epub 2011 Nov 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于在多重填补中纳入抽样权重的两步半参数方法。

A two-step semiparametric method to accommodate sampling weights in multiple imputation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献