Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore.
Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220145. doi: 10.1098/rsta.2022.0145. Epub 2023 Mar 27.
Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared with traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored Markov chain Monte Carlo schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
一些涉及计数的应用程序呈现出大量的零值(超额零数据)。对于此类数据,一种流行的模型是障碍模型,该模型明确地对零计数的概率进行建模,同时假设正整数的抽样分布。我们考虑来自多个计数过程的数据。在这种情况下,研究计数模式并相应地对主体进行聚类是很有意义的。我们介绍了一种新的贝叶斯方法来聚类多个可能相关的零膨胀过程。我们为零膨胀计数提出了一个联合模型,为每个过程指定一个障碍模型,其抽样分布为移位的负二项式分布。在模型参数的条件下,假设不同的过程是独立的,与传统的多元方法相比,这大大减少了参数的数量。通过具有随机数量组件的有限混合,灵活地对零膨胀的个体概率和抽样分布的参数进行建模。这会根据零/非零模式(外部聚类)和抽样分布(内部聚类)对主体进行两级聚类。通过定制的马尔可夫链蒙特卡罗方案进行后验推断。我们在涉及使用消息服务 WhatsApp 的应用程序中演示了所提出的方法。本文是“贝叶斯推理:挑战、观点和前景”主题特刊的一部分。