贝叶斯方法分析具有离散度的零膨胀聚类计数数据。

A Bayesian approach for analyzing zero-inflated clustered count data with dispersion.

机构信息

Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, Rockville, Maryland 20850, USA.

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky 40202, USA.

出版信息

Stat Med. 2018 Feb 28;37(5):801-812. doi: 10.1002/sim.7541. Epub 2017 Nov 6.

DOI:10.1002/sim.7541

PMID:29108124

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5799048/

Abstract

In practice, count data may exhibit varying dispersion patterns and excessive zero values; additionally, they may appear in groups or clusters sharing a common source of variation. We present a novel Bayesian approach for analyzing such data. To model these features, we combine the Conway-Maxwell-Poisson distribution, which allows both overdispersion and underdispersion, with a hurdle component for the zeros and random effects for clustering. We propose an efficient Markov chain Monte Carlo sampling scheme to obtain posterior inference from our model. Through simulation studies, we compare our hurdle Conway-Maxwell-Poisson model with a hurdle Poisson model to demonstrate the effectiveness of our Conway-Maxwell-Poisson approach. Furthermore, we apply our model to analyze an illustrative dataset containing information on the number and types of carious lesions on each tooth in a population of 9-year-olds from the Iowa Fluoride Study, which is an ongoing longitudinal study on a cohort of Iowa children that began in 1991.

摘要

实际上，计数数据可能表现出不同的离散模式和过多的零值；此外，它们可能以共享共同变异源的群组或聚类形式出现。我们提出了一种分析此类数据的新的贝叶斯方法。为了对这些特征进行建模，我们将允许过度分散和欠分散的 Conway-Maxwell-Poisson 分布与零值的障碍分量和聚类的随机效应相结合。我们提出了一种有效的马尔可夫链蒙特卡罗抽样方案，以便从我们的模型中获得后验推断。通过模拟研究，我们将我们的障碍 Conway-Maxwell-Poisson 模型与障碍泊松模型进行比较，以证明我们的 Conway-Maxwell-Poisson 方法的有效性。此外，我们将模型应用于分析一个说明性数据集，该数据集包含了来自爱荷华州氟化物研究中 9 岁人群中每颗牙齿上龋齿病变数量和类型的信息，这是一项始于 1991 年的针对爱荷华州儿童队列的正在进行的纵向研究。