Kang Bokgyeong, Hughes John, Haran Murali
Department of Statistical Science, Duke University.
College of Health, Lehigh University.
J Comput Graph Stat. 2025;34(2):697-706. doi: 10.1080/10618600.2024.2394460. Epub 2024 Sep 24.
Count data with complex features arise in many disciplines, including ecology, agriculture, criminology, medicine, and public health. Zero inflation, spatial dependence, and non-equidispersion are common features in count data. There are currently two classes of models that allow for these features-the mode-parameterized Conway-Maxwell-Poisson (COMP) distribution and the generalized Poisson model. However both require the use of either constraints on the parameter space or a parameterization that leads to challenges in interpretability. We propose spatial mean-parameterized COMP models that retain the flexibility of these models while resolving the above issues. We use a Bayesian spatial filtering approach in order to efficiently handle high-dimensional spatial data and we use reversible-jump MCMC to automatically choose the basis vectors for spatial filtering. The COMP distribution poses two additional computational challenges-an intractable normalizing function in the likelihood and no closed-form expression for the mean. We propose a fast computational approach that addresses these challenges by, respectively, introducing an efficient auxiliary variable algorithm and pre-computing key approximations for fast likelihood evaluation. We illustrate the application of our methodology to simulated and real datasets, including Texas HPV-cancer data and US vaccine refusal data. Supplementary materials for this article are available online.
具有复杂特征的计数数据出现在许多学科中,包括生态学、农业、犯罪学、医学和公共卫生。零膨胀、空间依赖性和非等离散性是计数数据中的常见特征。目前有两类模型可以考虑这些特征——模式参数化的康威 - 麦克斯韦 - 泊松(COMP)分布和广义泊松模型。然而,这两种模型都需要对参数空间使用约束条件,或者采用一种在可解释性方面存在挑战的参数化方法。我们提出了空间均值参数化的COMP模型,该模型在保留这些模型灵活性的同时解决了上述问题。我们使用贝叶斯空间滤波方法来有效处理高维空间数据,并使用可逆跳跃马尔可夫链蒙特卡罗(MCMC)方法自动选择空间滤波的基向量。COMP分布带来了另外两个计算挑战——似然函数中的难以处理的归一化函数以及均值没有封闭形式的表达式。我们提出了一种快速计算方法,分别通过引入一种有效的辅助变量算法和预先计算关键近似值以进行快速似然评估来应对这些挑战。我们说明了我们的方法在模拟数据集和真实数据集上的应用,包括德克萨斯州人乳头瘤病毒 - 癌症数据和美国疫苗拒绝数据。本文的补充材料可在线获取。