Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium.
Open Analytics NV, Antwerp, Belgium.
Stat Med. 2023 Dec 20;42(29):5405-5418. doi: 10.1002/sim.9919. Epub 2023 Sep 27.
Imputation of longitudinal categorical covariates with several waves and many predictors is cumbersome in terms of implausible transitions, colinearity, and overfitting. We designed a simulation study with data obtained from a general practitioners' morbidity registry in Belgium for three waves, with smoking as the longitudinal covariate of interest. We set varying proportions of data on smoking to missing completely at random and missing not at random with proportions of missingness equal to 10%, 30%, 50%, and 70%. This study proposed a 3-stage approach that allows flexibility when imputing time-dependent categorical covariates. First, multiple imputation using fully conditional specification or multiple imputation for the predictor variables was deployed using the wide format such that previous and future information of the same patient was utilized. Second, a joint Markov transition model for initial, forward, backward, and intermittent probabilities was developed for each imputed dataset. Finally, this transition model was used for imputation. We compared the performance of this methodology with an analyses of the complete data and with listwise deletion in terms of bias and root mean square error. Next, we applied this methodology in a clinical case for years 2017 to 2021, where we estimated the effect of several covariates on the pneumococcal vaccination. This methodological framework ensures that the plausibility of transitions is preserved, overfitting and colinearity issues are resolved, and confounders can be utilized. Finally, a companion R package was developed to enable the replication and easy application of this methodology.
在涉及多个波次和多个预测变量的纵向分类协变量推断时,不合理的转换、共线性和过度拟合等问题会变得很繁琐。我们设计了一项模拟研究,数据来自比利时的一般开业医生发病率登记处,涉及三个波次,以吸烟作为感兴趣的纵向协变量。我们将吸烟的数据设置为完全随机缺失和非随机缺失,缺失比例分别为 10%、30%、50%和 70%。本研究提出了一种 3 阶段方法,在推断时变分类协变量时具有灵活性。首先,使用完全条件规范或预测变量的多重插补,采用宽格式进行多重插补,从而利用同一患者的先前和未来信息。其次,为每个插补数据集开发了用于初始、正向、反向和间歇概率的联合马尔可夫转移模型。最后,使用该转移模型进行插补。我们比较了该方法与完整数据分析和列表删除的性能,在偏差和均方根误差方面进行了比较。然后,我们将该方法应用于 2017 年至 2021 年的临床病例,以估计多个协变量对肺炎球菌疫苗接种的影响。该方法框架确保了转换的合理性,解决了过度拟合和共线性问题,并可以利用混杂因素。最后,开发了一个配套的 R 包,以实现该方法的复制和易于应用。