Department of Econometrics, Statistics and Applied Econometrics, Riskcenter-IREA, Universitat de Barcelona (UB), Avinguda Diagonal 690, 08034, Barcelona, Spain.
Centre de Recerca Matemàtica, Universitat Autònoma de Barcelona (UAB), 08193, Cerdanyola del Vallès, Spain.
BMC Med Res Methodol. 2021 Dec 12;21(1):277. doi: 10.1186/s12874-021-01427-2.
Zero-inflated models are generally aimed to addressing the problem that arises from having two different sources that generate the zero values observed in a distribution. In practice, this is due to the fact that the population studied actually consists of two subpopulations: one in which the value zero is by default (structural zero) and the other is circumstantial (sample zero).
This work proposes a new methodology to fit zero inflated Bernoulli data from a Bayesian approach, able to distinguish between two potential sources of zeros (structural and non-structural).
The proposed methodology performance has been evaluated through a comprehensive simulation study, and it has been compiled as an R package freely available to the community. Its usage is illustrated by means of a real example from the field of occupational health as the phenomenon of sickness presenteeism, in which it is reasonable to think that some individuals will never be at risk of suffering it because they have not been sick in the period of study (structural zeros). Without separating structural and non-structural zeros one would be studying jointly the general health status and the presenteeism itself, and therefore obtaining potentially biased estimates as the phenomenon is being implicitly underestimated by diluting it into the general health status.
The proposed methodology is able to distinguish two different sources of zeros (structural and non-structural) from dichotomous data with or without covariates in a Bayesian framework, and has been made available to any interested researcher in the form of the bayesZIB R package ( https://cran.r-project.org/package=bayesZIB ).
零膨胀模型通常旨在解决由于观察到的分布中有两个不同的来源产生零值而产生的问题。实际上,这是因为所研究的人群实际上由两个亚群组成:一个亚群默认情况下零值(结构零),另一个亚群是偶然的(样本零)。
本工作提出了一种从贝叶斯方法拟合零膨胀伯努利数据的新方法,能够区分两种潜在的零源(结构和非结构)。
通过全面的模拟研究评估了所提出的方法的性能,并将其作为一个 R 包免费提供给社区。通过来自职业健康领域的真实示例说明了其用法,例如病假出勤现象,在这种情况下,一些人因为在研究期间没有生病(结构零)而不会有患病的风险,这是合理的。如果不将结构零和非结构零分开,将联合研究一般健康状况和出勤本身,因此由于将其稀释到一般健康状况中,从而获得潜在有偏的估计。
所提出的方法能够在贝叶斯框架中区分二分类数据中两种不同的零源(结构和非结构),并以 bayesZIB R 包的形式提供给任何感兴趣的研究人员(https://cran.r-project.org/package=bayesZIB)。