School of Mathematical and Statistical Sciences, University of Texas Rio Grande Valley, One West University Boulevard, Brownsville CampusBrownsville, TX, 78520, USA.
BMC Med Res Methodol. 2022 Aug 4;22(1):211. doi: 10.1186/s12874-022-01685-8.
Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data can be treated as count data, with discrete and non-negative values, typically right skewed, and often exhibiting excessive zeros. In this study, we compared the performance of the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regression models using simulated and empirical data.
Data were generated under different simulation scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. Analysis of hospital LOS was conducted using empirical data from the Medical Information Mart for Intensive Care database.
Results showed that Poisson and ZIP models performed poorly in overdispersed data. ZIP outperformed the rest of the regression models when the overdispersion is due to zero-inflation only. NB and ZINB regression models faced substantial convergence issues when incorrectly used to model equidispersed data. NB model provided the best fit in overdispersed data and outperformed the ZINB model in many simulation scenarios with combinations of zero-inflation and overdispersion, regardless of the sample size. In the empirical data analysis, we demonstrated that fitting incorrect models to overdispersed data leaded to incorrect regression coefficients estimates and overstated significance of some of the predictors.
Based on this study, we recommend to the researchers that they consider the ZIP models for count data with zero-inflation only and NB models for overdispersed data or data with combinations of zero-inflation and overdispersion. If the researcher believes there are two different data generating mechanisms producing zeros, then the ZINB regression model may provide greater flexibility when modeling the zero-inflation and overdispersion.
住院时长(LOS)是医院管理效率、医疗成本和医院规划的关键指标。医院 LOS 通常被用作医疗后程序结果的衡量标准,作为治疗效果的指导,或作为不良事件的重要风险因素。因此,了解医院 LOS 的变化一直是医疗保健的重点。医院 LOS 数据可以视为计数数据,具有离散的非负数值,通常呈右偏态分布,并且经常出现大量零值。在这项研究中,我们使用模拟数据和实际数据比较了泊松、负二项式(NB)、零膨胀泊松(ZIP)和零膨胀负二项式(ZINB)回归模型的性能。
在不同的模拟场景下,根据样本量、零值比例和过度离散程度的变化生成数据。使用来自重症监护医疗信息集市数据库的实际数据对医院 LOS 进行分析。
结果表明,泊松和 ZIP 模型在过度离散数据下表现不佳。当过度离散仅由于零膨胀引起时,ZIP 模型优于其他回归模型。当错误地用于模拟等分散数据时,NB 和 ZINB 回归模型会遇到严重的收敛问题。NB 模型在过度离散数据中提供了最佳拟合,并且在许多具有零膨胀和过度离散组合的模拟场景中,无论样本量如何,都优于 ZINB 模型。在实际数据分析中,我们证明了将错误模型拟合到过度离散数据中会导致回归系数估计错误,并夸大了一些预测因子的显著性。
基于这项研究,我们建议研究人员对于仅具有零膨胀的计数数据考虑使用 ZIP 模型,对于过度离散数据或具有零膨胀和过度离散组合的数据考虑使用 NB 模型。如果研究人员认为有两种不同的数据生成机制产生零值,则在对零膨胀和过度离散进行建模时,ZINB 回归模型可能提供更大的灵活性。