Department of Biostatistics, University of North Carolina at Chapel Hill, NC, USA.
Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, NC, USA.
Stat Methods Med Res. 2023 Jul;32(7):1300-1317. doi: 10.1177/09622802231172028. Epub 2023 May 11.
The zero-inflated negative binomial distribution has been widely used for count data analyses in various biomedical settings due to its capacity of modeling excess zeros and overdispersion. When there are correlated count variables, a bivariate model is essential for understanding their full distributional features. Examples include measuring correlation of two genes in sparse single-cell RNA sequencing data and modeling dental caries count indices on two different tooth surface types. For these purposes, we develop a richly parametrized bivariate zero-inflated negative binomial model that has a simple latent variable framework and eight free parameters with intuitive interpretations. In the scRNA-seq data example, the correlation is estimated after adjusting for the effects of dropout events represented by excess zeros. In the dental caries data, we analyze how the treatment with Xylitol lozenges affects the marginal mean and other patterns of response manifested in the two dental caries traits. An R package "bzinb" is available on Comprehensive R Archive Network.
零膨胀负二项分布由于能够很好地拟合数据中存在的过多零值和离散现象,已经在生物医学等多个领域的计数数据分析中得到了广泛应用。当存在相关的计数变量时,为了充分理解它们的分布特征,双变量模型就显得尤为重要。例如,在稀疏的单细胞 RNA 测序数据中测量两个基因的相关性,或者在两种不同的牙齿表面类型上对龋齿计数指标进行建模。为此,我们开发了一个参数丰富的双变量零膨胀负二项分布模型,它具有简单的潜在变量框架和八个具有直观解释的自由参数。在单细胞 RNA-seq 数据示例中,在调整了由过多零值表示的缺失事件的影响后,我们对相关性进行了估计。在龋齿数据中,我们分析了木糖醇锭剂的治疗如何影响两种龋齿特征的边缘均值和其他表现出来的反应模式。一个名为“bzinb”的 R 包可在 Comprehensive R Archive Network 上获取。