Suppr超能文献

可扩展的贝叶斯非参数聚类与分类

Scalable Bayesian Nonparametric Clustering and Classification.

作者信息

Ni Yang, Müller Peter, Diesendruck Maurice, Williamson Sinead, Zhu Yitan, Ji Yuan

机构信息

Department of Statistics, Texas A&M University.

Department of Statistics and Data Sciences, The University of Texas at Austin.

出版信息

J Comput Graph Stat. 2020;29(1):53-65. doi: 10.1080/10618600.2019.1624366. Epub 2019 Jul 19.

Abstract

We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and competitive classification performance relative to other widely used competing classifiers. Supplementary materials for this article are available online.

摘要

我们开发了一种可扩展的多步蒙特卡罗算法,用于在一大类用于聚类和分类的非参数贝叶斯模型下进行推理。每一步都是“易于并行化的”,并且可以使用相同的马尔可夫链蒙特卡罗采样器来实现。我们方法的简单性和通用性使得对适用于大型数据集的广泛贝叶斯非参数混合模型进行推理成为可能。具体来说,我们将该方法应用于具有协变量回归的乘积划分模型下的推理。我们展示了对两个具有启发性的数据集进行推理的结果:一大组电子健康记录(EHR)和一个银行电话营销数据集。相对于其他广泛使用的竞争分类器,我们发现了有趣的聚类和具有竞争力的分类性能。本文的补充材料可在线获取。

相似文献

1
Scalable Bayesian Nonparametric Clustering and Classification.可扩展的贝叶斯非参数聚类与分类
J Comput Graph Stat. 2020;29(1):53-65. doi: 10.1080/10618600.2019.1624366. Epub 2019 Jul 19.
2
Consensus Monte Carlo for Random Subsets using Shared Anchors.使用共享锚点的随机子集的共识蒙特卡罗方法。
J Comput Graph Stat. 2020;29(4):703-714. doi: 10.1080/10618600.2020.1737085. Epub 2020 Apr 15.
6
Adaptive Incremental Mixture Markov Chain Monte Carlo.自适应增量混合马尔可夫链蒙特卡罗方法
J Comput Graph Stat. 2019;28(4):790-805. doi: 10.1080/10618600.2019.1598872. Epub 2019 Jun 7.

引用本文的文献

3
Multi-way overlapping clustering by Bayesian tensor decomposition.基于贝叶斯张量分解的多路重叠聚类
Stat Interface. 2024;17(2):219-230. doi: 10.4310/23-sii790. Epub 2024 Feb 1.
7
Consensus clustering for Bayesian mixture models.贝叶斯混合模型的一致性聚类。
BMC Bioinformatics. 2022 Jul 21;23(1):290. doi: 10.1186/s12859-022-04830-8.
9
Consensus Monte Carlo for Random Subsets using Shared Anchors.使用共享锚点的随机子集的共识蒙特卡罗方法。
J Comput Graph Stat. 2020;29(4):703-714. doi: 10.1080/10618600.2020.1737085. Epub 2020 Apr 15.

本文引用的文献

1
Two-Stage Metropolis-Hastings for Tall Data.用于高维数据的两阶段 metropolis-Hastings 算法
J Classif. 2018 Apr;35(1):29-51. doi: 10.1007/s00357-018-9248-z. Epub 2018 Mar 16.
2
Optimal Bayesian estimators for latent variable cluster models.潜在变量聚类模型的最优贝叶斯估计量。
Stat Comput. 2018;28(6):1169-1186. doi: 10.1007/s11222-017-9786-y. Epub 2017 Oct 31.
3
Heterogeneous reciprocal graphical models.异质互反图形模型。
Biometrics. 2018 Jun;74(2):606-615. doi: 10.1111/biom.12791. Epub 2017 Oct 10.
4
Identifying Mixtures of Mixtures Using Bayesian Estimation.使用贝叶斯估计识别混合混合物。
J Comput Graph Stat. 2017 Apr 3;26(2):285-295. doi: 10.1080/10618600.2016.1200472. Epub 2017 Apr 24.
5
Sparse covariance estimation in heterogeneous samples.异质样本中的稀疏协方差估计
Electron J Stat. 2011;5:981-1014. doi: 10.1214/11-EJS634. Epub 2011 Sep 15.
8
Quantum support vector machine for big data classification.用于大数据分类的量子支持向量机。
Phys Rev Lett. 2014 Sep 26;113(13):130503. doi: 10.1103/PhysRevLett.113.130503. Epub 2014 Sep 25.
10
Semiparametric Bayesian classification with longitudinal markers.具有纵向标记的半参数贝叶斯分类
J R Stat Soc Ser C Appl Stat. 2007 Mar;56(2):119-37. doi: 10.1111/j.1467-9876.2007.00569.x.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验