Ni Yang, Müller Peter, Diesendruck Maurice, Williamson Sinead, Zhu Yitan, Ji Yuan
Department of Statistics, Texas A&M University.
Department of Statistics and Data Sciences, The University of Texas at Austin.
J Comput Graph Stat. 2020;29(1):53-65. doi: 10.1080/10618600.2019.1624366. Epub 2019 Jul 19.
We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and competitive classification performance relative to other widely used competing classifiers. Supplementary materials for this article are available online.
我们开发了一种可扩展的多步蒙特卡罗算法,用于在一大类用于聚类和分类的非参数贝叶斯模型下进行推理。每一步都是“易于并行化的”,并且可以使用相同的马尔可夫链蒙特卡罗采样器来实现。我们方法的简单性和通用性使得对适用于大型数据集的广泛贝叶斯非参数混合模型进行推理成为可能。具体来说,我们将该方法应用于具有协变量回归的乘积划分模型下的推理。我们展示了对两个具有启发性的数据集进行推理的结果:一大组电子健康记录(EHR)和一个银行电话营销数据集。相对于其他广泛使用的竞争分类器,我们发现了有趣的聚类和具有竞争力的分类性能。本文的补充材料可在线获取。