Lee Andy H, Wang Kui, Scott Jane A, Yau Kelvin K W, McLachlan Geoffrey J
Department of Epidemiology and Biostatistics, School of Public Health, Curtin University of Technology, Perth, WA, Australia.
Stat Methods Med Res. 2006 Feb;15(1):47-61. doi: 10.1191/0962280206sm429oa.
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which render the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
相对于泊松分布而言,具有过多零值的计数数据在许多生物医学应用中很常见。分析此类数据的一种常用方法是使用零膨胀泊松(ZIP)回归模型。通常,由于分层研究设计或数据收集程序,零膨胀和缺乏独立性可能同时出现,这使得标准的ZIP模型不够充分。为了考虑零计数的优势和观测值的内在相关性,提出了一类具有随机效应的多级ZIP回归模型。使用期望最大化算法有助于模型拟合,而方差分量则通过残差最大似然估计方程进行估计。还提出了零膨胀的得分检验。然后将多级ZIP模型进行推广,以应对更复杂的相关结构。将该方法应用于纵向婴儿喂养研究的相关计数数据分析,说明了该方法的实用性。