Rollins School of Public Health, Emory University, Atlanta, GA, United States.
JMIR Public Health Surveill. 2022 Sep 9;8(9):e37887. doi: 10.2196/37887.
Surveillance data are essential public health resources for guiding policy and allocation of human and capital resources. These data often consist of large collections of information based on nonrandom sample designs. Population estimates based on such data may be impacted by the underlying sample distribution compared to the true population of interest. In this study, we simulate a population of interest and allow response rates to vary in nonrandom ways to illustrate and measure the effect this has on population-based estimates of an important public health policy outcome.
The aim of this study was to illustrate the effect of nonrandom missingness on population-based survey sample estimation.
We simulated a population of respondents answering a survey question about their satisfaction with their community's policy regarding vaccination mandates for government personnel. We allowed response rates to differ between the generally satisfied and dissatisfied and considered the effect of common efforts to control for potential bias such as sampling weights, sample size inflation, and hypothesis tests for determining missingness at random. We compared these conditions via mean squared errors and sampling variability to characterize the bias in estimation arising under these different approaches.
Sample estimates present clear and quantifiable bias, even in the most favorable response profile. On a 5-point Likert scale, nonrandom missingness resulted in errors averaging to almost a full point away from the truth. Efforts to mitigate bias through sample size inflation and sampling weights have negligible effects on the overall results. Additionally, hypothesis testing for departures from random missingness rarely detect the nonrandom missingness across the widest range of response profiles considered.
Our results suggest that assuming surveillance data are missing at random during analysis could provide estimates that are widely different from what we might see in the whole population. Policy decisions based on such potentially biased estimates could be devastating in terms of community disengagement and health disparities. Alternative approaches to analysis that move away from broad generalization of a mismeasured population at risk are necessary to identify the marginalized groups, where overall response may be very different from those observed in measured respondents.
监测数据是指导政策制定和分配人力及资本资源的重要公共卫生资源。这些数据通常由基于非随机抽样设计的大量信息组成。与感兴趣的真实人群相比,基于此类数据的人群估计可能会受到基础样本分布的影响。在本研究中,我们模拟了一个感兴趣的人群,并允许以非随机的方式改变应答率,以说明和衡量这对基于人群的重要公共卫生政策结果的估计产生的影响。
本研究旨在说明非随机缺失对基于人群的调查样本估计的影响。
我们模拟了一组回答有关其对社区关于政府人员疫苗接种强制令政策满意度的调查问题的应答者。我们允许一般满意和不满意的应答者之间的应答率存在差异,并考虑了常见的控制潜在偏差的措施,如抽样权重、样本量膨胀和用于确定随机缺失的假设检验。我们通过均方误差和抽样变异性比较了这些条件,以描述在这些不同方法下产生的估计偏差。
即使在最有利的应答特征下,样本估计也会出现明显且可量化的偏差。在 5 点李克特量表上,非随机缺失导致的误差平均偏离真实值近一个点。通过样本量膨胀和抽样权重来减轻偏差的努力对总体结果几乎没有影响。此外,针对偏离随机缺失的假设检验很少在考虑的最广泛的应答特征范围内检测到非随机缺失。
我们的研究结果表明,在分析中假设监测数据缺失是随机的,可能会导致与我们在整个人群中看到的估计值大相径庭。基于此类潜在有偏差的估计值做出的政策决策可能会对社区脱节和健康差异产生灾难性的影响。需要采取替代的分析方法,避免对处于危险中的人群进行广泛的错误推断,以确定边缘化群体,这些群体的总体应答率可能与测量应答者的应答率有很大不同。