Shen Chung-Wei, Chen Yi-Hau
Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan.
Biom J. 2013 Nov;55(6):899-911. doi: 10.1002/bimj.201200236. Epub 2013 Aug 23.
Longitudinal data often encounter missingness with monotone and/or intermittent missing patterns. Multiple imputation (MI) has been popularly employed for analysis of missing longitudinal data. In particular, the MI-GEE method has been proposed for inference of generalized estimating equations (GEE) when missing data are imputed via MI. However, little is known about how to perform model selection with multiply imputed longitudinal data. In this work, we extend the existing GEE model selection criteria, including the "quasi-likelihood under the independence model criterion" (QIC) and the "missing longitudinal information criterion" (MLIC), to accommodate multiple imputed datasets for selection of the MI-GEE mean model. According to real data analyses from a schizophrenia study and an AIDS study, as well as simulations under nonmonotone missingness with moderate proportion of missing observations, we conclude that: (i) more than a few imputed datasets are required for stable and reliable model selection in MI-GEE analysis; (ii) the MI-based GEE model selection methods with a suitable number of imputations generally perform well, while the naive application of existing model selection methods by simply ignoring missing observations may lead to very poor performance; (iii) the model selection criteria based on improper (frequentist) multiple imputation generally performs better than their analogies based on proper (Bayesian) multiple imputation.
纵向数据常常会遇到具有单调和/或间歇性缺失模式的缺失值问题。多重填补(MI)已被广泛用于分析缺失的纵向数据。特别是,当通过MI对缺失数据进行填补时,已提出了MI-GEE方法用于广义估计方程(GEE)的推断。然而,对于如何使用多重填补的纵向数据进行模型选择却知之甚少。在这项工作中,我们扩展了现有的GEE模型选择标准,包括“独立模型准则下的拟似然”(QIC)和“缺失纵向信息准则”(MLIC),以适应多重填补数据集来选择MI-GEE均值模型。根据一项精神分裂症研究和一项艾滋病研究的实际数据分析,以及在具有中等比例缺失观测值的非单调缺失情况下的模拟,我们得出以下结论:(i)在MI-GEE分析中,需要多个填补数据集才能进行稳定可靠的模型选择;(ii)具有适当填补次数的基于MI的GEE模型选择方法通常表现良好,而简单地忽略缺失观测值而直接应用现有模型选择方法可能会导致非常差的性能;(iii)基于不恰当(频率主义)多重填补的模型选择标准通常比基于恰当(贝叶斯)多重填补的类似标准表现更好。