Little Roderick J A, West Brady T, Boonstra Philip S, Hu Jingwei
Professor of Biostatistics at the School of Public Health and Research Professor in the Survey Methodology Program (SMP), Survey Research Center (SRC), Institute for Social Research (ISR), University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109-2029, USA.
Research Associate Professor in the Survey Methodology Program (SMP), Survey Research Center (SRC), Institute for Social Research (ISR), University of Michigan, 426 Thompson Street, Ann Arbor, MI 48106-1248, USA.
J Surv Stat Methodol. 2020 Nov;8(5):932-964. doi: 10.1093/jssam/smz023. Epub 2019 Aug 29.
With the current focus of survey researchers on "big data" that are not selected by probability sampling, measures of the degree of potential sampling bias arising from this nonrandom selection are sorely needed. Existing indices of this degree of departure from probability sampling, like the R-indicator, are based on functions of the propensity of inclusion in the sample, estimated by modeling the inclusion probability as a function of auxiliary variables. These methods are agnostic about the relationship between the inclusion probability and survey outcomes, which is a crucial feature of the problem. We propose a simple index of degree of departure from ignorable sample selection that corrects this deficiency, which we call the standardized measure of unadjusted bias (SMUB). The index is based on normal pattern-mixture models for nonresponse applied to this sample selection problem and is grounded in the model-based framework of nonignorable selection first proposed in the context of nonresponse by Don Rubin in 1976. The index depends on an inestimable parameter that measures the deviation from selection at random, which ranges between the values zero and one. We propose the use of a central value of this parameter, 0.5, for computing a point index, and computing the values of SMUB at zero and one to provide a range of the index in a sensitivity analysis. We also provide a fully Bayesian approach for computing credible intervals for the SMUB, reflecting uncertainty in the values of all of the input parameters. The proposed methods have been implemented in R and are illustrated using real data from the National Survey of Family Growth.
鉴于当前调查研究人员关注的是未通过概率抽样选取的“大数据”,因此迫切需要衡量这种非随机选择所产生的潜在抽样偏差程度的方法。现有的衡量偏离概率抽样程度的指标,如R指标,是基于样本包含倾向的函数,通过将包含概率建模为辅助变量的函数来估计。这些方法对包含概率与调查结果之间的关系不做考虑,而这是该问题的一个关键特征。我们提出了一个简单的衡量偏离可忽略样本选择程度的指标,它纠正了这一缺陷,我们称之为未调整偏差的标准化度量(SMUB)。该指标基于应用于此样本选择问题的非应答的正态模式混合模型,并基于1976年唐·鲁宾在非应答背景下首次提出的基于模型的非可忽略选择框架。该指标依赖于一个不可估计的参数,该参数衡量与随机选择的偏差,取值范围在0到1之间。我们建议使用该参数的中心值0.5来计算一个点指标,并计算SMUB在0和1时的值,以便在敏感性分析中提供该指标的范围。我们还提供了一种完全贝叶斯方法来计算SMUB的可信区间,反映所有输入参数值的不确定性。所提出的方法已在R语言中实现,并使用来自全国家庭成长调查的实际数据进行了说明。