Chakrabarti Arhit, Ni Yang, Morris Ellen Ruth A, Salinas Michael L, Chapkin Robert S, Mallick Bani K
Department of Statistics, Texas A&M University, College Station, TX 77843-3143, USA.
Department of Statistics, CPRIT Single Cell Data Science Core, Texas A&M University, College Station, TX 77843-3143, USA.
J Mach Learn Res. 2024;25.
We consider the problem of clustering grouped data with possibly non-exchangeable groups whose dependencies can be characterized by a known directed acyclic graph. To allow the sharing of clusters among the non-exchangeable groups, we propose a Bayesian nonparametric approach, termed graphical Dirichlet process, that jointly models the dependent group-specific random measures by assuming each random measure to be distributed as a Dirichlet process whose concentration parameter and base probability measure depend on those of its parent groups. The resulting joint stochastic process respects the Markov property of the directed acyclic graph that links the groups. We characterize the graphical Dirichlet process using a novel hypergraph representation as well as the stick-breaking representation, the restaurant-type representation, and the representation as a limit of a finite mixture model. We develop an efficient posterior inference algorithm and illustrate our model with simulations and a real grouped single-cell data set.
我们考虑对分组数据进行聚类的问题,这些分组可能是不可交换的,其依赖性可以由一个已知的有向无环图来表征。为了允许在不可交换的组之间共享聚类,我们提出了一种贝叶斯非参数方法,称为图形狄利克雷过程,该方法通过假设每个随机测度服从狄利克雷过程来联合建模依赖于特定组的随机测度,其浓度参数和基础概率测度取决于其父组的参数。由此产生的联合随机过程遵循连接这些组的有向无环图的马尔可夫性质。我们使用一种新颖的超图表示以及折断棍子表示、餐厅类型表示和作为有限混合模型极限的表示来刻画图形狄利克雷过程。我们开发了一种高效的后验推理算法,并用模拟和一个真实的分组单细胞数据集来说明我们的模型。