Rose C E, Martin S W, Wannemuehler K A, Plikaytis B D
Bacterial Vaccine-Preventable Diseases Branch, Division of Epidemiology and Surveillance, CDC, Atlanta, Georgia 30333, USA.
J Biopharm Stat. 2006;16(4):463-81. doi: 10.1080/10543400600719384.
We compared several modeling strategies for vaccine adverse event count data in which the data are characterized by excess zeroes and heteroskedasticity. Count data are routinely modeled using Poisson and Negative Binomial (NB) regression but zero-inflated and hurdle models may be advantageous in this setting. Here we compared the fit of the Poisson, Negative Binomial (NB), zero-inflated Poisson (ZIP), zero-inflated Negative Binomial (ZINB), Poisson Hurdle (PH), and Negative Binomial Hurdle (NBH) models. In general, for public health studies, we may conceptualize zero-inflated models as allowing zeroes to arise from at-risk and not-at-risk populations. In contrast, hurdle models may be conceptualized as having zeroes only from an at-risk population. Our results illustrate, for our data, that the ZINB and NBH models are preferred but these models are indistinguishable with respect to fit. Choosing between the zero-inflated and hurdle modeling framework, assuming Poisson and NB models are inadequate because of excess zeroes, should generally be based on the study design and purpose. If the study's purpose is inference then modeling framework should be considered. For example, if the study design leads to count endpoints with both structural and sample zeroes then generally the zero-inflated modeling framework is more appropriate, while in contrast, if the endpoint of interest, by design, only exhibits sample zeroes (e.g., at-risk participants) then the hurdle model framework is generally preferred. Conversely, if the study's primary purpose it is to develop a prediction model then both the zero-inflated and hurdle modeling frameworks should be adequate.
我们比较了几种针对疫苗不良事件计数数据的建模策略,这类数据的特点是存在过多零值和异方差性。计数数据通常使用泊松回归和负二项式(NB)回归进行建模,但在这种情况下,零膨胀模型和门槛模型可能更具优势。在此,我们比较了泊松模型、负二项式(NB)模型、零膨胀泊松(ZIP)模型、零膨胀负二项式(ZINB)模型、泊松门槛(PH)模型和负二项式门槛(NBH)模型的拟合情况。一般来说,对于公共卫生研究,我们可以将零膨胀模型理解为允许零值来自有风险和无风险人群。相比之下,门槛模型可以理解为零值仅来自有风险人群。我们的结果表明,对于我们的数据,ZINB模型和NBH模型更受青睐,但就拟合度而言,这些模型难以区分。在零膨胀模型和门槛模型框架之间进行选择时,假设由于过多零值而使泊松模型和NB模型不适用,通常应基于研究设计和目的。如果研究目的是进行推断,那么应考虑建模框架。例如,如果研究设计导致计数终点既有结构零值又有样本零值,那么一般零膨胀建模框架更合适,相反,如果感兴趣的终点按设计仅呈现样本零值(如有风险参与者),那么通常更倾向于门槛模型框架。相反,如果研究的主要目的是开发预测模型,那么零膨胀模型和门槛模型框架都应该适用。