Si Yajuan
University of Michigan.
Stat Sci. 2025 May;40(2):272-288. doi: 10.1214/24-sts932. Epub 2025 Jun 2.
Multilevel regression and poststratification (MRP) is a popular method for addressing selection bias in subgroup estimation, with broad applications across fields from social sciences to public health. In this paper, we examine the inferential validity of MRP in finite populations, exploring the impact of poststratification and model specification. The success of MRP relies heavily on the availability of auxiliary information that is strongly related to the outcome. To enhance the fitting performance of the outcome model, we recommend modeling the inclusion probabilities conditionally on auxiliary variables and incorporating flexible functions of estimated inclusion probabilities as predictors in the mean structure. We present a statistical data integration framework that offers robust inferences for probability and nonprobability surveys, addressing various challenges in practical applications. Our simulation studies indicate the statistical validity of MRP, which involves a tradeoff between bias and variance, with greater benefits for subgroup estimates with small sample sizes, compared to alternative methods. We have applied our methods to the Adolescent Brain Cognitive Development (ABCD) Study, which collected information on children across 21 geographic locations in the U.S. to provide national representation, but is subject to selection bias as a nonprobability sample. We focus on the cognition measure of diverse groups of children in the ABCD study and show that the use of auxiliary variables affects the findings on cognitive performance.
多级回归与事后分层(MRP)是一种在亚组估计中解决选择偏差的常用方法,在从社会科学到公共卫生等各个领域都有广泛应用。在本文中,我们研究了MRP在有限总体中的推断有效性,探讨了事后分层和模型设定的影响。MRP的成功很大程度上依赖于与结果密切相关的辅助信息的可用性。为了提高结果模型的拟合性能,我们建议在辅助变量的条件下对包含概率进行建模,并将估计包含概率的灵活函数作为均值结构中的预测变量纳入。我们提出了一个统计数据整合框架,该框架为概率和非概率调查提供稳健的推断,解决了实际应用中的各种挑战。我们的模拟研究表明了MRP的统计有效性,它涉及偏差和方差之间的权衡,与其他方法相比,对于小样本量的亚组估计有更大的益处。我们已将我们的方法应用于青少年大脑认知发展(ABCD)研究,该研究收集了美国21个地理位置的儿童信息以提供全国代表性,但作为非概率样本存在选择偏差。我们关注ABCD研究中不同儿童群体的认知测量,并表明辅助变量的使用会影响认知表现的研究结果。