Tang Wan, Lu Naiji, Chen Tian, Wang Wenjuan, Gunzler Douglas David, Han Yu, Tu Xin M
Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, U.S.A.
Department of Management, Harbin Institute of Technology, Harbin, China.
Stat Med. 2015 Oct 30;34(24):3235-45. doi: 10.1002/sim.6560. Epub 2015 Jun 15.
Zero-inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero-inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non-risk group in the population, the ZIP (ZINB) models a two-component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at-risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution-free alternative and compare its performance with these popular parametric models as well as a moment-based approach proposed by Yu et al. [Statistics in Medicine 2013; 32: 2390-2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero-inflated responses. We illustrate our approach with both simulated and real study data.
零膨胀泊松(ZIP)模型和零膨胀负二项式(ZINB)模型被广泛用于对零膨胀计数响应进行建模。这些模型扩展了泊松模型和负二项式(NB)模型,以解决计数响应中过多的零值问题。通过添加一个以0为中心的退化分布,并将其解释为描述总体中的一个非风险组,ZIP(ZINB)模型构建了一个双组分总体混合模型。与泊松模型和NB模型的应用一样,ZIP和ZINB之间的关键区别在于,在对风险组的计数响应进行建模时,ZINB在其NB分量中允许存在过度离散。实际中出现的过度离散往往不遵循NB分布,将ZINB应用于此类数据会产生无效推断。如果已知过度离散的来源,可以使用其他参数模型直接对过度离散进行建模。此类模型也受假设分布的影响。此外,如果无法获得有关过度离散来源的信息,这种方法可能不适用。在本文中,我们提出了一种无分布替代方法,并将其性能与这些流行的参数模型以及Yu等人提出的基于矩的方法进行比较[《医学统计学》2013年;32:2390 - 2405]。与广义估计方程一样,所提出的方法不需要复杂的分布假设。与Yu等人的方法相比,它对过度离散的零膨胀响应更具稳健性。我们用模拟数据和实际研究数据说明了我们的方法。