Suppr超能文献

用于分析零膨胀和过度分散计数数据的模型:在香烟和大麻使用中的应用。

Models for analyzing zero-inflated and overdispersed count data: an application to cigarette and marijuana use.

作者信息

Pittman Brian, Buta Eugenia, Krishnan-Sarin Suchitra, O'Malley Stephanie S, Liss Thomas, Gueorguieva Ralitza

机构信息

Department of Psychiatry, Yale School of Medicine.

Department of Biostatistics, Yale School of Public Health.

出版信息

Nicotine Tob Res. 2018 Apr 18;22(8):1390-8. doi: 10.1093/ntr/nty072.

Abstract

INTRODUCTION

This paper describes different methods for analyzing counts and illustrates their use on cigarette and marijuana smoking data.

METHODS

The Poisson, zero-inflated Poisson (ZIP), hurdle Poisson (HUP), negative binomial (NB), zero-inflated negative binomial (ZINB) and hurdle negative binomial (HUNB) regression models are considered. The different approaches are evaluated in terms of the ability to take into account zero-inflation (extra zeroes) and overdispersion (variance larger than expected) in count outcomes, with emphasis placed on model fit, interpretation, and choosing an appropriate model given the nature of the data. The illustrative data example focuses on cigarette and marijuana smoking reports from a study on smoking habits among youth e-cigarette users with gender, age, and e-cigarette use included as predictors.

RESULTS

Of the 69 subjects available for analysis, 36% and 64% reported smoking no cigarettes and no marijuana, respectively, suggesting both outcomes might be zero-inflated. Both outcomes were also overdispersed with large positive skew. The ZINB and HUNB models fit the cigarette counts best. According to goodness-of-fit statistics, the NB, HUNB, and ZINB models fit the marijuana data well, but the ZINB provided better interpretation.

CONCLUSION

In the absence of zero-inflation, the NB model fits smoking data well, which is typically overdispersed. In the presence of zero-inflation, the ZINB or HUNB model is recommended to account for additional heterogeneity. In addition to model fit and interpretability, choosing between a zero-inflated or hurdle model should ultimately depend on the assumptions regarding the zeros, study design, and the research question being asked.

IMPLICATIONS

Count outcomes are frequent in tobacco research and often have many zeros and exhibit large variance and skew. Analyzing such data based on methods requiring a normally distributed outcome are inappropriate and will likely produce spurious results. This study compares and contrasts appropriate methods for analyzing count data, specifically those with an over-abundance of zeros, and illustrates their use on cigarette and marijuana smoking data. Recommendations are provided.

摘要

引言

本文描述了分析计数的不同方法,并举例说明了它们在香烟和大麻吸烟数据中的应用。

方法

考虑了泊松、零膨胀泊松(ZIP)、障碍泊松(HUP)、负二项式(NB)、零膨胀负二项式(ZINB)和障碍负二项式(HUNB)回归模型。根据考虑计数结果中的零膨胀(额外的零)和过度离散(方差大于预期)的能力对不同方法进行评估,重点在于模型拟合、解释以及根据数据性质选择合适的模型。示例数据聚焦于一项针对青少年电子烟使用者吸烟习惯的研究中的香烟和大麻吸烟报告,将性别、年龄和电子烟使用作为预测变量。

结果

在可供分析的69名受试者中,分别有36%和64%的人报告不吸烟和不吸食大麻,这表明这两种结果可能都存在零膨胀。两种结果也都存在过度离散且有较大的正偏态。ZINB和HUNB模型对香烟计数的拟合效果最佳。根据拟合优度统计,NB、HUNB和ZINB模型对大麻数据的拟合效果良好,但ZINB提供了更好的解释。

结论

在不存在零膨胀的情况下,NB模型对通常存在过度离散的吸烟数据拟合良好。在存在零膨胀的情况下,建议使用ZINB或HUNB模型来考虑额外的异质性。除了模型拟合和可解释性之外,在零膨胀模型和障碍模型之间进行选择最终应取决于关于零值的假设、研究设计以及所提出的研究问题。

启示

计数结果在烟草研究中很常见,并且通常有许多零值,表现出较大的方差和偏态。基于要求结果呈正态分布的方法来分析此类数据是不合适的,很可能会产生虚假结果。本研究比较并对比了分析计数数据的合适方法,特别是那些零值过多的数据,并举例说明了它们在香烟和大麻吸烟数据中的应用。同时给出了相关建议。

相似文献

8
Statistical modelling of falls count data with excess zeros.基于过零数据的跌倒计数资料的统计建模。
Inj Prev. 2011 Aug;17(4):266-70. doi: 10.1136/ip.2011.031740. Epub 2011 Jun 8.

引用本文的文献

本文引用的文献

7
Improving the analysis and modeling of substance use.改进物质使用的分析与建模。
Am J Drug Alcohol Abuse. 2015;41(6):475-8. doi: 10.3109/00952990.2015.1085264. Epub 2015 Sep 25.
8
High School Students' Use of Electronic Cigarettes to Vaporize Cannabis.高中生使用电子烟蒸发大麻。
Pediatrics. 2015 Oct;136(4):611-6. doi: 10.1542/peds.2015-1727. Epub 2015 Sep 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验