Dai Lin, Sweat Michael D, Gebregziabher Mulugeta
1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, USA.
2 Center for Global Health, Medical University of South Carolina, Charleston, South Carolina, USA.
Stat Methods Med Res. 2018 Jan;27(1):208-220. doi: 10.1177/0962280215626608. Epub 2016 Jul 20.
Purpose To show a novel application of a weighted zero-inflated negative binomial model in modeling count data with excess zeros and heterogeneity to quantify the regional variation in HIV-AIDS prevalence in sub-Saharan African countries. Methods Data come from latest round of the Demographic and Health Survey (DHS) conducted in three countries (Ethiopia-2011, Kenya-2009 and Rwanda-2010) using a two-stage cluster sampling design. The outcome is an aggregate count of HIV cases in each census enumeration area of each country. The outcome data are characterized by excess zeros and heterogeneity due to clustering. We compare scale weighted zero-inflated negative binomial models with and without random effects to account for zero-inflation, complex survey design and clustering. Finally, we provide marginalized rate ratio estimates from the best zero-inflated negative binomial model. Results The best fitting zero-inflated negative binomial model is scale weighted and with a common random intercept for the three countries. Rate ratio estimates from the final model show that HIV prevalence is associated with age and gender distribution, HIV acceptance, HIV knowledge, and its regional variation is associated with divorce rate, burden of sexually transmitted diseases and rural residence. Conclusions Scale weighted zero-inflated negative binomial with proper modeling of random effects is shown to be the best model for count data from a complex survey design characterized by excess zeros and extra heterogeneity. In our data example, the final rate ratio estimates show significant regional variation in the factors associated with HIV prevalence indicating that HIV intervention strategies should be tailored to the unique factors found in each country.
目的 展示加权零膨胀负二项式模型在对存在过多零值和异质性的计数数据进行建模时的一种新应用,以量化撒哈拉以南非洲国家艾滋病毒/艾滋病流行率的区域差异。方法 数据来自在三个国家(埃塞俄比亚 - 2011年、肯尼亚 - 2009年和卢旺达 - 2010年)进行的最新一轮人口与健康调查(DHS),采用两阶段整群抽样设计。结果是每个国家每个普查枚举区域内艾滋病毒病例的汇总计数。结果数据的特点是由于聚类而存在过多零值和异质性。我们比较了有无随机效应的尺度加权零膨胀负二项式模型,以考虑零膨胀、复杂调查设计和聚类情况。最后,我们提供了最佳零膨胀负二项式模型的边际率比估计值。结果 最佳拟合的零膨胀负二项式模型是尺度加权的,且三个国家有共同的随机截距。最终模型的率比估计表明,艾滋病毒流行率与年龄和性别分布、对艾滋病毒的接受程度、艾滋病毒知识相关,其区域差异与离婚率、性传播疾病负担和农村居住情况相关。结论 尺度加权零膨胀负二项式模型结合适当的随机效应建模被证明是处理具有过多零值和额外异质性的复杂调查设计计数数据的最佳模型。在我们的数据示例中,最终的率比估计显示与艾滋病毒流行率相关的因素存在显著区域差异,这表明艾滋病毒干预策略应根据每个国家发现的独特因素进行调整。