Yau Christopher, Holmes Chris
Department of Statistics, University of Oxford, Oxford, U.K.,
Bayesian Anal. 2011 Jul 1;6(2):329-352. doi: 10.1214/11-BA612.
We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.
当假设某些协变量与聚类问题的相关性不同时,我们提出一种用于聚类的分层贝叶斯非参数混合模型。这可以被视为无监督学习中变量选择的一个问题。我们证明,通过在由似然的逆协方差矩阵缩放的聚类位置上定义基于分层总体的非参数先验,我们得到了一种“稀疏先验”表示,该表示允许条件共轭先验。这使我们能够执行完全吉布斯采样,以获得感兴趣参数的后验分布,包括每个协变量相关性的显式度量以及数据中潜在聚类数量的分布。这也允许进行单个聚类特定的变量选择。我们在一些典型问题上展示了改进的推断。