用于嵌套聚类的非参数贝叶斯模型

A Nonparametric Bayesian Model for Nested Clustering.

作者信息

Lee Juhee, Müller Peter, Zhu Yitan, Ji Yuan

机构信息

Department of Applied Mathematics and Statistics, UC Santa Cruz, Santa Cruz, CA, USA.

Department of Mathematics, UT Austin, Austin, TX, USA.

出版信息

Methods Mol Biol. 2016;1362:129-41. doi: 10.1007/978-1-4939-3106-4_8.

DOI:10.1007/978-1-4939-3106-4_8

PMID:26519174

Abstract

We propose a nonparametric Bayesian model for clustering where clusters of experimental units are determined by a shared pattern of clustering another set of experimental units. The proposed model is motivated by the analysis of protein activation data, where we cluster proteins such that all proteins in one cluster give rise to the same clustering of patients. That is, we define clusters of proteins by the way that patients group with respect to the corresponding protein activations. This is in contrast to (almost) all currently available models that use shared parameters in the sampling model to define clusters. This includes in particular model based clustering, Dirichlet process mixtures, product partition models, and more. We show results for two typical biostatistical inference problems that give rise to clustering.

摘要

我们提出了一种用于聚类的非参数贝叶斯模型，其中实验单元的聚类由另一组实验单元的共享聚类模式确定。所提出的模型是由蛋白质激活数据分析推动的，在该分析中，我们对蛋白质进行聚类，使得一个聚类中的所有蛋白质会导致患者的相同聚类。也就是说，我们通过患者相对于相应蛋白质激活的分组方式来定义蛋白质聚类。这与（几乎）所有当前可用的在采样模型中使用共享参数来定义聚类的模型形成对比。这尤其包括基于模型的聚类、狄利克雷过程混合模型、乘积划分模型等等。我们展示了两个典型的导致聚类的生物统计推断问题的结果。