Suppr超能文献

中心划分过程:聚类的信息先验(附讨论)

Centered Partition Processes: Informative Priors for Clustering (with Discussion).

作者信息

Paganin Sally, Herring Amy H, Olshan Andrew F, Dunson David B

机构信息

Department of Environmental Science, Policy, and Management, University of California, Berkeley.

Department of Statistical Science, Duke University, Durham.

出版信息

Bayesian Anal. 2021 Mar;16(1):301-370. doi: 10.1214/20-BA1197. Epub 2020 Feb 13.

Abstract

There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors encompass a broad class of such cases, including Dirichlet and Pitman-Yor processes. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition. For example, we are motivated by an epidemiological application, in which we wish to cluster birth defects into groups and we have prior knowledge of an initial clustering provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process that modifies the EPPF to favor partitions close to an initial one. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects.

摘要

有大量丰富的文献提出了用于聚类的贝叶斯方法,这些方法从分区上的先验概率分布开始。大多数方法假定可交换性,从而在可交换分区概率函数(EPPF)方面产生简单的表示形式。吉布斯型先验涵盖了这类情况中的一大类,包括狄利克雷过程和皮特曼 - 约尔过程。尽管已经有一些提议放宽可交换性假设,允许协变量依赖性和部分可交换性,但对于如何纳入关于分区的具体先验知识的考虑却很有限。例如,我们受到一项流行病学应用的启发,在该应用中,我们希望将出生缺陷聚类成组,并且我们拥有专家提供的初始聚类的先验知识。作为纳入此类先验知识的一般方法,我们提出了一种中心分区(CP)过程,该过程修改EPPF以支持接近初始分区的分区。描述了CP先验的一些性质,开发了一种用于后验计算的通用算法,并且我们通过模拟示例以及将其应用于关于出生缺陷的激励性流行病学研究来说明该方法。

相似文献

1
Centered Partition Processes: Informative Priors for Clustering (with Discussion).
Bayesian Anal. 2021 Mar;16(1):301-370. doi: 10.1214/20-BA1197. Epub 2020 Feb 13.
2
Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process?
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):212-29. doi: 10.1109/TPAMI.2013.217.
3
Latent Nested Nonparametric Priors (with Discussion).
Bayesian Anal. 2019 Dec;14(4):1303-1356. doi: 10.1214/19-BA1169. Epub 2019 Jun 27.
4
Generalized species sampling priors with latent Beta reinforcements.
J Am Stat Assoc. 2014 Dec 1;109(508):1466-1480. doi: 10.1080/01621459.2014.950735.
5
Random Partition Distribution Indexed by Pairwise Information.
J Am Stat Assoc. 2017;112(518):721-732. doi: 10.1080/01621459.2016.1165103. Epub 2017 Apr 12.
6
Clustering blood donors via mixtures of product partition models with covariates.
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad021.
7
Dirichlet-Laplace priors for optimal shrinkage.
J Am Stat Assoc. 2015 Dec 1;110(512):1479-1490. doi: 10.1080/01621459.2014.960967. Epub 2014 Sep 25.
8
Finding the mean in a partition distribution.
BMC Bioinformatics. 2018 Oct 12;19(1):375. doi: 10.1186/s12859-018-2359-z.
9
Pitman Yor Diffusion Trees for Bayesian Hierarchical Clustering.
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):271-89. doi: 10.1109/TPAMI.2014.2313115.
10
Perfect Sampling of the Posterior in the Hierarchical Pitman-Yor Process.
Bayesian Anal. 2022 Sep;17(3):685-709. doi: 10.1214/21-ba1269. Epub 2021 Apr 27.

引用本文的文献

2
Spectral Clustering, Bayesian Spanning Forest, and Forest Process.
J Am Stat Assoc. 2024;119(547):2140-2153. doi: 10.1080/01621459.2023.2250098. Epub 2023 Sep 29.
4
Bayesian cluster analysis.
Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220149. doi: 10.1098/rsta.2022.0149. Epub 2023 Mar 27.
5
MULTIVARIATE MIXED MEMBERSHIP MODELING: INFERRING DOMAIN-SPECIFIC RISK PROFILES.
Ann Appl Stat. 2022 Mar;16(1):391-413. doi: 10.1214/21-aoas1496. Epub 2022 Mar 28.

本文引用的文献

1
Random Partition Distribution Indexed by Pairwise Information.
J Am Stat Assoc. 2017;112(518):721-732. doi: 10.1080/01621459.2016.1165103. Epub 2017 Apr 12.
2
Data augmentation for models based on rejection sampling.
Biometrika. 2016 Jun;103(2):319-335. doi: 10.1093/biomet/asw005. Epub 2016 May 6.
3
Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process?
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):212-29. doi: 10.1109/TPAMI.2013.217.
5
The protective effects of nausea and vomiting of pregnancy against adverse fetal outcome--a systematic review.
Reprod Toxicol. 2014 Aug;47:77-80. doi: 10.1016/j.reprotox.2014.05.012. Epub 2014 Jun 2.
6
Nonparametric Bayesian models through probit stick-breaking processes.
Bayesian Anal. 2011 Mar 1;6(1). doi: 10.1214/11-BA605.
7
A Product Partition Model With Regression on Covariates.
J Comput Graph Stat. 2011 Mar 1;20(1):260-278. doi: 10.1198/jcgs.2011.09066.
8
Bayesian semiparametric multiple shrinkage.
Biometrics. 2010 Jun;66(2):455-62. doi: 10.1111/j.1541-0420.2009.01275.x. Epub 2009 Jun 8.
9
Bayesian hierarchical functional data analysis via contaminated informative priors.
Biometrics. 2009 Sep;65(3):772-80. doi: 10.1111/j.1541-0420.2008.01163.x. Epub 2009 Jan 23.
10
Kernel stick-breaking processes.
Biometrika. 2008;95(2):307-323. doi: 10.1093/biomet/asn012.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验