Zhou Hanzhi, Elliott Michael R, Raghunathan Trviellore E
Mathematics Policy Institute, Princeton, New Jersey, U.S.A.
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.
Biometrics. 2016 Mar;72(1):242-52. doi: 10.1111/biom.12413. Epub 2015 Sep 22.
Multiple imputation (MI) is a well-established method to handle item-nonresponse in sample surveys. Survey data obtained from complex sampling designs often involve features that include unequal probability of selection. MI requires imputation to be congenial, that is, for the imputations to come from a Bayesian predictive distribution and for the observed and complete data estimator to equal the posterior mean given the observed or complete data, and similarly for the observed and complete variance estimator to equal the posterior variance given the observed or complete data; more colloquially, the analyst and imputer make similar modeling assumptions. Yet multiply imputed data sets from complex sample designs with unequal sampling weights are typically imputed under simple random sampling assumptions and then analyzed using methods that account for the sampling weights. This is a setting in which the analyst assumes more than the imputer, which can led to biased estimates and anti-conservative inference. Less commonly used alternatives such as including case weights as predictors in the imputation model typically require interaction terms for more complex estimators such as regression coefficients, and can be vulnerable to model misspecification and difficult to implement. We develop a simple two-step MI framework that accounts for sampling weights using a weighted finite population Bayesian bootstrap method to validly impute the whole population (including item nonresponse) from the observed data. In the second step, having generated posterior predictive distributions of the entire population, we use standard IID imputation to handle the item nonresponse. Simulation results show that the proposed method has good frequentist properties and is robust to model misspecification compared to alternative approaches. We apply the proposed method to accommodate missing data in the Behavioral Risk Factor Surveillance System when estimating means and parameters of regression models.
多重填补(MI)是样本调查中处理项目无应答的一种成熟方法。从复杂抽样设计中获得的调查数据通常具有包括不等选择概率在内的特征。MI要求填补是适宜的,也就是说,填补值应来自贝叶斯预测分布,并且观测数据和完整数据估计量应等于给定观测数据或完整数据时的后验均值,同样,观测数据和完整方差估计量应等于给定观测数据或完整数据时的后验方差;通俗地说,分析师和填补者做出相似的建模假设。然而,来自具有不等抽样权重的复杂样本设计的多重填补数据集通常在简单随机抽样假设下进行填补,然后使用考虑抽样权重的方法进行分析。在这种情况下,分析师假设的比填补者多,这可能导致估计有偏差和反保守推断。不太常用的替代方法,如在填补模型中包含个案权重作为预测变量,通常需要为更复杂的估计量(如回归系数)设置交互项,并且可能容易受到模型误设的影响且难以实施。我们开发了一个简单的两步MI框架,该框架使用加权有限总体贝叶斯自助法考虑抽样权重,以从观测数据中有效地填补整个总体(包括项目无应答)。在第二步中,在生成了整个总体的后验预测分布后,我们使用标准独立同分布填补来处理项目无应答。模拟结果表明,与替代方法相比,所提出的方法具有良好的频率主义性质,并且对模型误设具有鲁棒性。我们将所提出的方法应用于行为风险因素监测系统中在估计回归模型的均值和参数时处理缺失数据的情况。