The Biostatistics Center, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Rockville, Maryland, USA.
Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
Stat Med. 2022 Feb 20;41(4):769-785. doi: 10.1002/sim.9260. Epub 2021 Nov 16.
Missing data are common in longitudinal cohort studies and can lead to bias, particularly in studies with informative missingness. Many common methods for handling informatively missing data in survey samples require correctly specifying a model for missingness. Although doubly robust methods exist to provide unbiased regression coefficients in the presence of missing outcome data, these methods do not account for correlation due to clustering inherent in longitudinal or cluster-sampled studies. In this work, we developed a doubly robust method to estimate the regression of an outcome on a predictor in the presence of missing multilevel data on the outcome, which results in consistent estimation of regression coefficients assuming correct specification of either (1) the probability of missingness or (2) the outcome model. This method involves specification of separate hierarchical models for missingness and for the outcome, conditional on observed auxiliary variables and cluster-specific random effects, to account for correlation among observations. We showed this proposed estimator is doubly robust and derived its asymptotic distribution, conducted simulation studies to compare the method to an existing doubly robust method developed for independent data, and applied the method to data from the China Health and Nutrition Survey, an ongoing multilevel longitudinal cohort study.
缺失数据在纵向队列研究中很常见,可能导致偏倚,特别是在信息缺失的研究中。许多常见的用于处理调查样本中信息缺失数据的方法都需要正确指定缺失模型。尽管存在双稳健方法来提供在缺失结局数据情况下无偏的回归系数,但这些方法没有考虑到纵向或聚类抽样研究中固有的聚类相关性。在这项工作中,我们开发了一种双稳健方法,用于在缺失多层次结局数据的情况下,对结局进行回归分析,在正确指定(1)缺失概率或(2)结局模型的情况下,该方法可以一致地估计回归系数。该方法涉及为缺失和结局分别指定层次模型,条件是观察到的辅助变量和聚类特定的随机效应,以解释观察值之间的相关性。我们证明了这个提出的估计器是双稳健的,并推导出了它的渐近分布,通过模拟研究将该方法与为独立数据开发的现有双稳健方法进行了比较,并将该方法应用于中国健康与营养调查的数据,这是一项正在进行的多层次纵向队列研究。