理清统计建模中的结构零和随机零。
Untangle the Structural and Random Zeros in Statistical Modelings.
作者信息
Tang W, He H, Wang W J, Chen D G
机构信息
Department of Global Biostatistics & Data Science, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA70122, USA.
Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA70122, USA.
出版信息
J Appl Stat. 2018;45(9):1714-1733. doi: 10.1080/02664763.2017.1391180. Epub 2017 Oct 24.
Count data with structural zeros are common in public health applications. There are considerable researches focusing on zero-inflated models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) models for such zero-inflated count data when used as response variable. However, when such variables are used as predictors, the difference between structural and random zeros is often ignored and may result in biased estimates. One remedy is to include an indicator of the structural zero in the model as a predictor if observed. However, structural zeros are often not observed in practice, in which case no statistical method is available to address the bias issue. This paper is aimed to fill this methodological gap by developing parametric methods to model zero-inflated count data when used as predictors based on the maximum likelihood approach. The response variable can be any type of data including continuous, binary, count or even zero-inflated count responses. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. A real data example is also used to demonstrate the application of this method.
在公共卫生应用中,带有结构零的计数数据很常见。当此类零膨胀计数数据用作响应变量时,有大量研究聚焦于零膨胀模型,如零膨胀泊松(ZIP)模型和零膨胀负二项式(ZINB)模型。然而,当此类变量用作预测变量时,结构零和随机零之间的差异常常被忽略,这可能导致估计有偏差。一种补救方法是,如果观察到结构零,就在模型中纳入一个结构零的指标作为预测变量。然而,在实际中结构零往往无法观察到,在这种情况下,没有统计方法可用于解决偏差问题。本文旨在通过基于最大似然法开发参数方法来对用作预测变量的零膨胀计数数据进行建模,以填补这一方法学空白。响应变量可以是任何类型的数据,包括连续数据、二元数据、计数数据,甚至是零膨胀计数响应数据。进行模拟研究以评估当样本量从小到中等时这种新方法的数值性能。还使用了一个实际数据示例来展示该方法的应用。