Alosh Mohamed
Division of Biometrics III, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA.
J Biopharm Stat. 2009 Nov;19(6):1039-54. doi: 10.1080/10543400903242787.
The impact of the missing data mechanism on estimates of model parameters for continuous data has been extensively investigated in the literature. In comparison, minimal research has been carried out for the impact of missing count data. The focus of this article is to investigate the impact of missing data on a transition model, termed the generalized autoregressive model of order 1 for longitudinal count data. The model has several features, including modeling dependence and accounting for overdispersion in the data, that make it appealing for the clinical trial setting. Furthermore, the model can be viewed as a natural extension of the commonly used log-linear model. Following introduction of the model and discussion of its estimation we investigate the impact of different missing data mechanisms on estimates of the model parameters through a simulation experiment. The findings of the simulation experiment show that, as in the case of normally distributed data, estimates under the missing completely at random (MCAR) and missing at random (MAR) mechanisms are close to their analogue for the full dataset and that the missing not at random (MNAR) mechanism has the greatest bias. Furthermore, estimates based on imputing the last observed value carried forward (LOCF) for missing data under the MAR assumption are similar to those of the MAR. This latter finding might be attributed to the Markov property underlying the model and to the high level of dependence among successive observations used in the simulation experiment. Finally, we consider an application of the generalized autoregressive model to a longitudinal epilepsy dataset analyzed in the literature.
缺失数据机制对连续数据模型参数估计的影响已在文献中得到广泛研究。相比之下,针对缺失计数数据的影响所开展的研究极少。本文的重点是研究缺失数据对一种转换模型的影响,该模型称为纵向计数数据的广义自回归一阶模型。该模型具有若干特性,包括对数据中的依赖性进行建模以及考虑数据的过度离散,这些特性使其在临床试验环境中颇具吸引力。此外,该模型可被视为常用对数线性模型的自然扩展。在介绍模型并讨论其估计方法之后,我们通过模拟实验研究不同缺失数据机制对模型参数估计的影响。模拟实验结果表明,与正态分布数据的情况一样,完全随机缺失(MCAR)和随机缺失(MAR)机制下的估计值接近完整数据集的对应估计值,而非随机缺失(MNAR)机制的偏差最大。此外,在MAR假设下基于向前推算最后一个观测值(LOCF)来插补缺失数据的估计值与MAR的估计值相似。后一发现可能归因于该模型 underlying 的马尔可夫性质以及模拟实验中连续观测值之间的高度依赖性。最后,我们考虑将广义自回归模型应用于文献中分析的一个纵向癫痫数据集。 (注:原文中“underlying the model”表述似乎不太完整准确,但按要求完整翻译了。)