Suppr超能文献

用于对零膨胀计数数据进行建模的零膨胀模型和障碍模型的比较。

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data.

作者信息

Feng Cindy Xin

机构信息

Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University, 5790 University Avenue, Halifax, B3H 4R2 Nova Scotia Canada.

出版信息

J Stat Distrib Appl. 2021;8(1):8. doi: 10.1186/s40488-021-00121-4. Epub 2021 Jun 24.

Abstract

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

摘要

在实际应用中,经常会遇到含有大量零值的计数数据。例如,医疗服务就诊次数通常包含许多零值,这些零值代表在随访期间未使用医疗服务的患者。这类数据的一个共同特征是,计数指标往往有过多的零值,超出了泊松分布或负二项分布等常见计数分布所能容纳的范围。零膨胀模型或门槛模型通常用于拟合此类数据。尽管零膨胀模型和门槛模型越来越受欢迎,但对这两种模型之间的根本差异仍缺乏研究。在本文中,我们回顾了零膨胀模型和门槛模型,并强调了它们在数据生成过程方面的差异。我们还进行了模拟研究,以评估这两种模型的性能。回归模型的最终选择应在仔细评估拟合优度之后做出,并应根据具体的数据进行调整。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f714/8570364/6379510ac614/40488_2021_121_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验