具有未知组件数量的过拟合贝叶斯混合模型。

Overfitting Bayesian Mixture Models with an Unknown Number of Components.

作者信息

van Havre Zoé, White Nicole, Rousseau Judith, Mengersen Kerrie

机构信息

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia; CEREMADE, Université Paris Dauphine, Paris, France.

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia.

出版信息

PLoS One. 2015 Jul 15;10(7):e0131739. doi: 10.1371/journal.pone.0131739. eCollection 2015.

DOI:10.1371/journal.pone.0131739

PMID:26177375

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4503697/

Abstract

This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge between multidimensional samplers and test based estimation methods, whereby priors are chosen to encourage extra groups to have weights approaching zero. MCMC sampling is made possible by the implementation of prior parallel tempering, an extension of parallel tempering. Zmix can accurately estimate the number of components, posterior parameter estimates and allocation probabilities given a sufficiently large sample size. The results will reflect uncertainty in the final model and will report the range of possible candidate models and their respective estimated probabilities from a single run. Label switching is resolved with a computationally light-weight method, Zswitch, developed for overfitted mixtures by exploiting the intuitiveness of allocation-based relabelling algorithms and the precision of label-invariant loss functions. Four simulation studies are included to illustrate Zmix and Zswitch, as well as three case studies from the literature. All methods are available as part of the R package Zmix, which can currently be applied to univariate Gaussian mixture models.

摘要

本文针对有限混合模型（组件数量未知）估计中的三个问题提出了解决方案：因过度拟合组件数量导致的不可识别性、标准马尔可夫链蒙特卡罗（MCMC）采样技术的混合局限性以及相关的标签切换问题。一种过度拟合方法通过Zmix算法用于估计有限混合模型中的组件数量。Zmix在多维采样器和基于测试的估计方法之间架起了一座桥梁，通过选择先验来促使额外的组权重趋近于零。通过实施先验并行回火（并行回火的扩展）实现了MCMC采样。在样本量足够大的情况下，Zmix能够准确估计组件数量、后验参数估计值和分配概率。结果将反映最终模型中的不确定性，并将报告单次运行中可能的候选模型范围及其各自的估计概率。通过一种计算量较小的方法Zswitch解决了标签切换问题，Zswitch是为过度拟合的混合模型开发的，利用了基于分配的重新标记算法的直观性和标签不变损失函数的精确性。包含了四项模拟研究来说明Zmix和Zswitch，以及来自文献的三个案例研究。所有方法都作为R包Zmix的一部分提供，目前可应用于单变量高斯混合模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23ca/4503697/0a00d944bc62/pone.0131739.g001.jpg

相似文献

Overfitting Bayesian Mixture Models with an Unknown Number of Components.

PLoS One. 2015 Jul 15;10(7):e0131739. doi: 10.1371/journal.pone.0131739. eCollection 2015.

Part 2. Development of Enhanced Statistical Methods for Assessing Health Effects Associated with an Unknown Number of Major Sources of Multiple Air Pollutants.

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):51-113.

Allocation Variable-Based Probabilistic Algorithm to Deal with Label Switching Problem in Bayesian Mixture Models.

PLoS One. 2015 Oct 12;10(10):e0138899. doi: 10.1371/journal.pone.0138899. eCollection 2015.

A probabilistic solution to the MEG inverse problem via MCMC methods: the reversible jump and parallel tempering algorithms.

IEEE Trans Biomed Eng. 2001 May;48(5):533-42. doi: 10.1109/10.918592.

Bayesian mixture models of variable dimension for image segmentation.

Comput Methods Programs Biomed. 2009 Apr;94(1):1-14. doi: 10.1016/j.cmpb.2008.05.010. Epub 2008 Nov 25.

A gradient Markov chain Monte Carlo algorithm for computing multivariate maximum likelihood estimates and posterior distributions: mixture dose-response assessment.

Risk Anal. 2012 Feb;32(2):345-59. doi: 10.1111/j.1539-6924.2011.01672.x. Epub 2011 Sep 11.

A Monte Carlo Metropolis-Hastings algorithm for sampling from distributions with intractable normalizing constants.

Neural Comput. 2013 Aug;25(8):2199-234. doi: 10.1162/NECO_a_00466. Epub 2013 Apr 22.

Dealing with Reflection Invariance in Bayesian Factor Analysis.

Psychometrika. 2017 Jun;82(2):295-307. doi: 10.1007/s11336-017-9564-y. Epub 2017 Mar 13.

Finite mixture varying coefficient models for analyzing longitudinal heterogenous data.

Stat Med. 2012 Mar 15;31(6):544-60. doi: 10.1002/sim.4420. Epub 2011 Dec 9.

Unsupervised learning of gaussian mixtures based on variational component splitting.

IEEE Trans Neural Netw. 2007 May;18(3):745-55. doi: 10.1109/TNN.2006.891114.

引用本文的文献

BAYESIAN LEARNING OF CLINICALLY MEANINGFUL SEPSIS PHENOTYPES IN NORTHERN TANZANIA.

Ann Appl Stat. 2025 Sep;19(3):2193-2217. doi: 10.1214/25-aoas2045. Epub 2025 Aug 28.

Disentangling Qualitatively Different Faking Strategies in High-Stakes Personality Assessments: A Mixture Extension of the Multidimensional Nominal Response Model.

Educ Psychol Meas. 2025 Jul 29:00131644251341843. doi: 10.1177/00131644251341843.

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data.

Bioinform Adv. 2025 Mar 17;5(1):vbaf055. doi: 10.1093/bioadv/vbaf055. eCollection 2025.

Derivation of outcome-dependent dietary patterns for low-income women obtained from survey data using a supervised weighted overfitted latent class analysis.

Biometrics. 2024 Oct 3;80(4). doi: 10.1093/biomtc/ujae122.

Identifying dietary consumption patterns from survey data: a Bayesian nonparametric latent class model.

J R Stat Soc Ser A Stat Soc. 2023 Dec 12;187(2):496-512. doi: 10.1093/jrsssa/qnad135. eCollection 2024 Apr.

BELMM: Bayesian model selection and random walk smoothing in time-series clustering.

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad686.

Bayesian cluster analysis.

Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220149. doi: 10.1098/rsta.2022.0149. Epub 2023 Mar 27.

Racial and ethnic heterogeneity in diets of low-income adult females in the United States: results from National Health and Nutrition Examination Surveys from 2011 to 2018.

Am J Clin Nutr. 2023 Mar;117(3):625-634. doi: 10.1016/j.ajcnut.2023.01.008.

Adaptability and stability of Coffea canephora to dynamic environments using the Bayesian approach.

Sci Rep. 2022 Jul 8;12(1):11608. doi: 10.1038/s41598-022-15190-x.

PyClone-VI: scalable inference of clonal population structures using whole genome data.

BMC Bioinformatics. 2020 Dec 10;21(1):571. doi: 10.1186/s12859-020-03919-2.

本文引用的文献

Probabilistic subgroup identification using Bayesian finite mixture modelling: a case study in Parkinson's disease phenotype identification.

Stat Methods Med Res. 2012 Dec;21(6):563-83. doi: 10.1177/0962280210391012. Epub 2010 Dec 16.

Parallel tempering: theory, applications, and new perspectives.

Phys Chem Chem Phys. 2005 Dec 7;7(23):3910-6. doi: 10.1039/b509983h.

Fully Bayesian mixture model for differential gene expression: simulations and model checks.

Stat Appl Genet Mol Biol. 2007;6:Article36. doi: 10.2202/1544-6115.1314. Epub 2007 Dec 21.

A Dirichlet process mixture model for brain MRI tissue classification.

Med Image Anal. 2007 Apr;11(2):169-82. doi: 10.1016/j.media.2006.12.002. Epub 2006 Dec 21.

Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference.

Bioinformatics. 2004 Feb 12;20(3):407-15. doi: 10.1093/bioinformatics/btg427. Epub 2004 Jan 22.

Replica Monte Carlo simulation of spin glasses.

Phys Rev Lett. 1986 Nov 24;57(21):2607-2609. doi: 10.1103/PhysRevLett.57.2607.

A population and family study of N-acetyltransferase using caffeine urinary metabolites.

Clin Pharmacol Ther. 1993 Aug;54(2):134-41. doi: 10.1038/clpt.1993.124.

Econometric mixture models and more general models for unobservables in duration analysis.

Stat Methods Med Res. 1994;3(3):279-99. doi: 10.1177/096228029400300306.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

具有未知组件数量的过拟合贝叶斯混合模型。

Overfitting Bayesian Mixture Models with an Unknown Number of Components.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献