Chipeta Michael G, Ngwira Bagrey M, Simoonga Christopher, Kazembe Lawrence N
Malawi Liverpool - Wellcome Trust Clinical Research Programme, PO Box 30096, Blantyre, Malawi.
BMC Res Notes. 2014 Nov 27;7:856. doi: 10.1186/1756-0500-7-856.
It is common in public health and epidemiology that the outcome of interest is counts of events occurrence. Analysing these data using classical linear models is mostly inappropriate, even after transformation of outcome variables due to overdispersion. Zero-adjusted mixture count models such as zero-inflated and hurdle count models are applied to count data when over-dispersion and excess zeros exist. Main objective of the current paper is to apply such models to analyse risk factors associated with human helminths (S. haematobium) particularly in a case where there's a high proportion of zero counts.
The data were collected during a community-based randomised control trial assessing the impact of mass drug administration (MDA) with praziquantel in Malawi, and a school-based cross sectional epidemiology survey in Zambia. Count data models including traditional (Poisson and negative binomial) models, zero modified models (zero inflated Poisson and zero inflated negative binomial) and hurdle models (Poisson logit hurdle and negative binomial logit hurdle) were fitted and compared.
Using Akaike information criteria (AIC), the negative binomial logit hurdle (NBLH) and zero inflated negative binomial (ZINB) showed best performance in both datasets. With regards to zero count capturing, these models performed better than other models.
This paper showed that zero modified NBLH and ZINB models are more appropriate methods for the analysis of data with excess zeros. The choice between the hurdle and zero-inflated models should be based on the aim and endpoints of the study.
在公共卫生和流行病学中,常见的情况是关注的结果是事件发生的计数。使用经典线性模型分析这些数据大多不合适,即使对结果变量进行变换后,由于过度离散也依然如此。当存在过度离散和过多零值时,零调整混合计数模型(如零膨胀和障碍计数模型)被应用于计数数据。本文的主要目的是应用此类模型来分析与人体寄生虫(埃及血吸虫)相关的风险因素,特别是在零计数比例较高的情况下。
数据收集于一项在马拉维评估吡喹酮大规模药物治疗(MDA)影响的社区随机对照试验,以及赞比亚一项基于学校的横断面流行病学调查。拟合并比较了包括传统(泊松和负二项式)模型、零修正模型(零膨胀泊松和零膨胀负二项式)和障碍模型(泊松对数障碍和负二项式对数障碍)在内的计数数据模型。
使用赤池信息准则(AIC),负二项式对数障碍(NBLH)和零膨胀负二项式(ZINB)在两个数据集中均表现最佳。在捕获零计数方面,这些模型比其他模型表现更好。
本文表明,零修正的NBLH和ZINB模型是分析存在过多零值数据的更合适方法。障碍模型和零膨胀模型之间的选择应基于研究的目的和终点。