Xi Wenna, Hinton Alice, Lu Bo, Krotki Karol, Keller-Hamilton Brittney, Ferketich Amy, Sukasih Amang
Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
College of Public Health, The Ohio State University, Columbus, OH, USA.
Commun Stat Simul Comput. 2024;53(7):3285-3301. doi: 10.1080/03610918.2022.2102181. Epub 2022 Jul 25.
In scientific studies with low-prevalence outcomes, probability sampling may be supplemented by nonprobability sampling to boost the sample size of desired subpopulation while remaining representative to the entire study population. To utilize both probability and nonprobability samples appropriately, several methods have been proposed in the literature to generate pseudo-weights, including ad-hoc weights, inclusion probability adjusted weights, and propensity score adjusted weights. We empirically compare various weighting strategies via an extensive simulation study, where probability and nonprobability samples are combined. Weight normalization and raking adjustment are also considered. Our simulation results suggest that the unity weight method (with weight normalization) and the inclusion probability adjusted weight method yield very good overall performance. This work is motivated by the Buckeye Teen Health Study, which examines risk factors for the initiation of smoking among teenage males in Ohio. To address the low response rate in the initial probability sample and low prevalence of smokers in the target population, a small convenience sample was collected as a supplement. Our proposed method yields estimates very close to the ones from the analysis using only the probability sample and enjoys the additional benefit of being able to track more teens with risky behaviors through follow-ups.
在针对低发生率结果的科学研究中,概率抽样可辅以非概率抽样,以增加目标亚人群的样本量,同时保持对整个研究人群的代表性。为了恰当地使用概率样本和非概率样本,文献中提出了几种生成伪权重的方法,包括临时权重、包含概率调整权重和倾向得分调整权重。我们通过一项广泛的模拟研究对各种加权策略进行实证比较,在该研究中,概率样本和非概率样本相结合。还考虑了权重归一化和耙式调整。我们的模拟结果表明,单位权重法(带权重归一化)和包含概率调整权重法具有非常好的整体性能。这项工作的灵感来自于七叶树青少年健康研究,该研究考察俄亥俄州青少年男性开始吸烟的风险因素。为了解决初始概率样本中低回应率以及目标人群中吸烟者低发生率的问题,收集了一个小的便利样本作为补充。我们提出的方法产生的估计值与仅使用概率样本进行分析得出的估计值非常接近,并且具有能够通过随访追踪更多有风险行为青少年的额外优势。