Department of Statistics, Columbia University, New York, New York.
Survey Research Center, University of Michigan, Ann Arbor, Michigan.
Stat Med. 2018 Nov 20;37(26):3849-3868. doi: 10.1002/sim.7892. Epub 2018 Jul 4.
Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider a two-stage cluster sampling design where the clusters are first selected with probability proportional to cluster size, and then units are randomly sampled inside selected clusters. Challenges arise when the sizes of the nonsampled cluster are unknown. We propose nonparametric and parametric Bayesian approaches for predicting the unknown cluster sizes, with this inference performed simultaneously with the model for survey outcome, with computation performed in the open-source Bayesian inference engine Stan. Simulation studies show that the integrated Bayesian approach outperforms classical methods with efficiency gains, especially under informative cluster sampling design with small number of selected clusters. We apply the method to the Fragile Families and Child Wellbeing study as an illustration of inference for complex health surveys.
整群抽样在调查实践中很常见,相应的推断主要基于设计。我们为整群抽样开发了一个贝叶斯框架,并在结果建模中考虑了设计效果。我们考虑了两阶段整群抽样设计,其中首先以与群大小成比例的概率选择群,然后在选定的群内随机抽取单位。当未知的未抽样群的大小出现时,就会出现挑战。我们提出了非参数和参数贝叶斯方法来预测未知的群大小,这种推断与调查结果模型同时进行,计算在开源贝叶斯推理引擎 Stan 中进行。模拟研究表明,集成贝叶斯方法在效率上优于经典方法,尤其是在具有少量选定群的信息丰富的整群抽样设计下。我们将该方法应用于脆弱家庭和儿童福利研究,作为复杂健康调查推断的一个例子。