Kim H, Kriebel D
University of Massachusetts Lowell, Lowell, Massachusetts, USA.
Occup Environ Med. 2009 Nov;66(11):733-9. doi: 10.1136/oem.2008.042887. Epub 2009 Aug 16.
Poisson regression is now widely used in epidemiology, but researchers do not always evaluate the potential for bias in this method when the data are overdispersed. This study used simulated data to evaluate sources of overdispersion in public health surveillance data and compare alternative statistical models for analysing such data. If count data are overdispersed, Poisson regression will not correctly estimate the variance. A model called negative binomial 2 (NB2) can correct for overdispersion, and may be preferred for analysis of count data. This paper compared the performance of Poisson and NB2 regression with simulated overdispersed injury surveillance data.
Monte Carlo simulation was used to assess the utility of the NB2 regression model as an alternative to Poisson regression for data which had several different sources of overdispersion. Simulated injury surveillance datasets were created in which an important predictor variable was omitted, as well as with an incorrect offset (denominator). The simulations evaluated the ability of Poisson regression and NB2 to correctly estimate the true determinants of injury and their confidence intervals.
The NB2 model was effective in reducing overdispersion, but it could not reduce bias in point estimates which resulted from omitting a covariate which was a confounder, nor could it reduce bias from using an incorrect offset. One advantage of NB2 over Poisson for overdispersed data was that the confidence interval for a covariate was considerably wider with the former, providing an indication that the Poisson model did not fit well.
When overdispersion is detected in a Poisson regression model, the NB2 model should be fit as an alternative. If there is no longer overdispersion, then the NB2 results may be preferred. However, it is important to remember that NB2 cannot correct for bias from omitted covariates or from using an incorrect offset.
泊松回归目前在流行病学中广泛应用,但当数据存在过度离散时,研究人员并不总是评估该方法中偏差的可能性。本研究使用模拟数据来评估公共卫生监测数据中过度离散的来源,并比较用于分析此类数据的替代统计模型。如果计数数据存在过度离散,泊松回归将无法正确估计方差。一种称为负二项式2(NB2)的模型可以校正过度离散,并且可能更适合用于计数数据的分析。本文比较了泊松回归和NB2回归在模拟的过度离散伤害监测数据中的性能。
采用蒙特卡罗模拟来评估NB2回归模型作为泊松回归替代方法对具有几种不同过度离散来源的数据的效用。创建模拟伤害监测数据集,其中遗漏了一个重要的预测变量,以及使用了错误的偏移量(分母)。模拟评估了泊松回归和NB2正确估计伤害的真正决定因素及其置信区间的能力。
NB2模型在减少过度离散方面有效,但它无法减少因遗漏作为混杂因素的协变量而导致的点估计偏差,也无法减少因使用错误偏移量而产生的偏差。对于过度离散数据,NB2相对于泊松回归的一个优点是,前者协变量的置信区间要宽得多,这表明泊松模型拟合不佳。
当在泊松回归模型中检测到过度离散时,应拟合NB2模型作为替代。如果不再存在过度离散,那么NB2的结果可能更可取。然而,重要的是要记住,NB2无法校正因遗漏协变量或使用错误偏移量而产生的偏差。