Suppr超能文献

计数数据的双变量零膨胀回归:一种贝叶斯方法及其在植物计数中的应用

Bivariate zero-inflated regression for count data: a Bayesian approach with application to plant counts.

作者信息

Majumdar Anandamayee, Gries Corinna

机构信息

Arizona State University, AZ, USA.

出版信息

Int J Biostat. 2010;6(1):Article 27. doi: 10.2202/1557-4679.1229.

Abstract

Lately, bivariate zero-inflated (BZI) regression models have been used in many instances in the medical sciences to model excess zeros. Examples include the BZI Poisson (BZIP), BZI negative binomial (BZINB) models, etc. Such formulations vary in the basic modeling aspect and use the EM algorithm (Dempster, Laird and Rubin, 1977) for parameter estimation. A different modeling formulation in the Bayesian context is given by Dagne (2004). We extend the modeling to a more general setting for multivariate ZIP models for count data with excess zeros as proposed by Li, Lu, Park, Kim, Brinkley and Peterson (1999), focusing on a particular bivariate regression formulation. For the basic formulation in the case of bivariate data, we assume that Xi are (latent) independent Poisson random variables with parameters λ i, i = 0, 1, 2. A bi-variate count vector (Y1, Y2) response follows a mixture of four distributions; p0 stands for the mixing probability of a point mass distribution at (0, 0); p1, the mixing probability that Y2 = 0, while Y1 = X0 + X1; p2, the mixing probability that Y1 = 0 while Y2 = X0 + X2; and finally (1 - p0 - p1 - p2), the mixing probability that Yi = Xi + X0, i = 1, 2. The choice of the parameters {pi, λ i, i = 0, 1, 2} ensures that the marginal distributions of Yi are zero inflated Poisson (λ 0 + λ i). All the parameters thus introduced are allowed to depend on co-variates through canonical link generalized linear models (McCullagh and Nelder, 1989). This flexibility allows for a range of real-life applications, especially in the medical and biological fields, where the counts are bivariate in nature (with strong association between the processes) and where there are excess of zeros in one or both processes. Our contribution in this paper is to employ a fully Bayesian approach consolidating the work of Dagne (2004) and Li et al. (1999) generalizing the modeling and sampling-based methods described by Ghosh, Mukhopadhyay and Lu (2006) to estimate the parameters and obtain posterior credible intervals both in the case where co-variates are not available as well as in the case where they are. In this context, we provide explicit data augmentation techniques that lend themselves to easier implementation of the Gibbs sampler by giving rise to well-known and closed-form posterior distributions in the bivariate ZIP case. We then use simulations to explore the effectiveness of this estimation using the Bayesian BZIP procedure, comparing the performance to the Bayesian and classical ZIP approaches. Finally, we demonstrate the methodology based on bivariate plant count data with excess zeros that was collected on plots in the Phoenix metropolitan area and compare the results with independent ZIP regression models fitted to both processes.

摘要

最近,双变量零膨胀(BZI)回归模型在医学领域的许多情况下被用于对过多的零值进行建模。例如双变量零膨胀泊松(BZIP)模型、双变量零膨胀负二项式(BZINB)模型等。这些模型在基本建模方面有所不同,并使用期望最大化(EM)算法(Dempster、Laird和Rubin,1977)进行参数估计。Dagne(2004)给出了贝叶斯背景下一种不同的建模公式。我们将建模扩展到更一般的多元零膨胀泊松(ZIP)模型设置,用于具有过多零值的计数数据,如Li、Lu、Park、Kim、Brinkley和Peterson(1999)所提出的,重点关注一种特定的双变量回归公式。对于双变量数据的基本公式,我们假设Xi是参数为λi的(潜在)独立泊松随机变量,i = 0, 1, 2。双变量计数向量(Y1, Y2)响应遵循四种分布的混合;p0代表在(0, 0)处点质量分布的混合概率;p1是Y2 = 0而Y1 = X0 + X1的混合概率;p2是Y1 = 0而Y2 = X0 + X2的混合概率;最后(1 - p0 - p1 - p2)是Yi = Xi + X0(i = 1, 2)的混合概率。参数{pi, λi, i = 0, 1, 2}的选择确保Yi的边际分布是零膨胀泊松分布(λ0 + λi)。所有这些引入的参数都可以通过规范链接广义线性模型(McCullagh和Nelder,1989)依赖于协变量。这种灵活性允许一系列实际应用,特别是在医学和生物学领域,其中计数本质上是双变量的(过程之间有很强的关联),并且在一个或两个过程中存在过多的零值。我们在本文中的贡献是采用一种完全贝叶斯方法,整合了Dagne(2004)和Li等人(1999)的工作,将Ghosh、Mukhopadhyay和Lu(2006)描述的建模和基于抽样的方法进行了推广,以估计参数,并在协变量不可用以及协变量可用的情况下都获得后验可信区间。在这种情况下,我们提供了明确的数据增强技术,通过在双变量ZIP情况下产生著名的封闭形式后验分布,使吉布斯采样器更易于实现。然后我们使用模拟来探索使用贝叶斯BZIP程序进行估计的有效性,并将性能与贝叶斯和经典ZIP方法进行比较。最后,我们基于在凤凰城大都市区的地块上收集的具有过多零值的双变量植物计数数据展示了该方法,并将结果与拟合到两个过程的独立ZIP回归模型进行比较。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验