Suppr超能文献

从这里到无穷:基于模型的聚类中稀疏有限混合模型与狄利克雷过程混合模型

From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering.

作者信息

Frühwirth-Schnatter Sylvia, Malsiner-Walli Gertraud

机构信息

Institute for Statistics and Mathematics, Vienna University of Economics and Business (WU), Welthandelsplatz 1, 1020 Vienna, Austria.

出版信息

Adv Data Anal Classif. 2019;13(1):33-64. doi: 10.1007/s11634-018-0329-y. Epub 2018 Aug 24.

Abstract

In model-based clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli et al. (Stat Comput 26:303-324, 2016) are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than with high probability. The number of clusters is then inferred a posteriori from the data. The present paper makes the following contributions in the context of sparse finite mixture modelling. First, it is illustrated that the concept of sparse finite mixture is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non-Gaussian clusters. Second, sparse finite mixtures are compared to Dirichlet process mixtures with respect to their ability to identify the number of clusters. For both model classes, a random hyper prior is considered for the parameters determining the weight distribution. By suitable matching of these priors, it is shown that the choice of this hyper prior is far more influential on the cluster solution than whether a sparse finite mixture or a Dirichlet process mixture is taken into consideration.

摘要

在基于模型的聚类中,混合模型用于将数据点分组为簇。Malsiner Walli等人(《统计计算》26:303 - 324,2016年)为高斯混合引入的一个有用概念是稀疏有限混合,其中对于具有(k)个分量的混合的权重分布的先验分布,其选择方式使得先验地数据中的簇数是随机的,并且以高概率允许其小于(k)。然后从数据中后验推断簇的数量。本文在稀疏有限混合建模的背景下做出了以下贡献。首先,说明了稀疏有限混合的概念非常通用,并且很容易扩展到对各种类型的非高斯数据进行聚类,特别是离散数据和来自非高斯簇的连续多变量数据。其次,将稀疏有限混合与狄利克雷过程混合在识别簇数量的能力方面进行了比较。对于这两种模型类别,对于确定权重分布的参数都考虑了一个随机超先验。通过对这些先验进行适当匹配,表明该超先验的选择对聚类解决方案的影响远大于是否考虑稀疏有限混合或狄利克雷过程混合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49c0/6448299/d24efdf78186/11634_2018_329_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验