Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia.
Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Victoria, Australia.
Biom J. 2021 Feb;63(2):354-371. doi: 10.1002/bimj.201900360. Epub 2020 Oct 25.
Many analyses of longitudinal cohorts require incorporating sampling weights to account for unequal sampling probabilities of participants, as well as the use of multiple imputation (MI) for dealing with missing data. However, there is no guidance on how MI and sampling weights should be implemented together. We simulated a target population based on the Australian Bureau of Statistics Estimated Resident Population and drew 1000 random samples dependent on three design variables to mimic the Longitudinal Study of Australian Children. The target analysis was the weighted prevalence of overweight/obesity over childhood. We evaluated the performance of several MI approaches available in Stata, based on multivariate normal imputation (MVNI), fully conditional specification (FCS) and twofold FCS: a weighted imputation model, imputing missing data separately for each quintile sampling weight grouping, including the design stratum indicator in the imputation model, and using sampling weights as a covariate in the imputation model. Approaches based on available cases and inverse probability weighting (IPW), with time-varying weights, were also compared. We observed severe issues of convergence with FCS and twofold FCS. All MVNI-based approaches performed similarly, producing minimal bias and nominal coverage, except for when imputation was conducted separately for each quintile sampling weight group. IPW performed equally as well as MVNI-based approaches in terms of bias, however, was less precise. In similar longitudinal studies, we recommend using MVNI with the design stratum as a covariate in the imputation model. If this is unknown, including the sampling weight as a covariate is an appropriate alternative.
许多纵向队列分析需要纳入抽样权重,以考虑参与者不等的抽样概率,以及采用多重插补(MI)处理缺失数据。然而,对于如何共同实施 MI 和抽样权重,尚无指导意见。我们基于澳大利亚统计局的估计常住人口模拟了一个目标人群,并根据三个设计变量抽取了 1000 个随机样本,以模拟澳大利亚儿童纵向研究。目标分析是儿童期超重/肥胖的加权患病率。我们评估了 Stata 中几种可用的 MI 方法的性能,这些方法基于多元正态插补(MVNI)、完全条件指定(FCS)和双重 FCS:加权插补模型,分别对每个五分位抽样权重分组中的缺失数据进行插补,在插补模型中包括设计层指标,并将抽样权重用作插补模型中的协变量。基于可用案例和逆概率加权(IPW)的方法,也与时间变化的权重进行了比较。我们观察到 FCS 和双重 FCS 存在严重的收敛问题。所有基于 MVNI 的方法的表现都相似,产生的偏差最小,名义覆盖率最低,除了分别对每个五分位抽样权重组进行插补的情况。在偏差方面,IPW 与基于 MVNI 的方法表现相当,但精度较低。在类似的纵向研究中,我们建议在插补模型中使用包含设计层的 MVNI 作为协变量。如果不知道设计层,可以将抽样权重作为协变量,这是一个合适的替代方案。