通过分层 BIC 对概率 PCA 的混合进行有效的模型选择。

Efficient model selection for mixtures of probabilistic PCA via hierarchical BIC.

出版信息

IEEE Trans Cybern. 2014 Oct;44(10):1871-83. doi: 10.1109/TCYB.2014.2298401.

DOI:10.1109/TCYB.2014.2298401

Abstract

This paper concerns model selection for mixtures of probabilistic principal component analyzers (MPCA). The well known Bayesian information criterion (BIC) is frequently used for this purpose. However, it is found that BIC penalizes each analyzer implausibly using the whole sample size. In this paper, we present a new criterion for MPCA called hierarchical BIC in which each analyzer is penalized using its own effective sample size only. Theoretically, hierarchical BIC is a large sample approximation of variational Bayesian lower bound and BIC is a further approximation of hierarchical BIC. To learn hierarchical-BIC-based MPCA, we propose two efficient algorithms: two-stage and one-stage variants. The two-stage algorithm integrates model selection with respect to the subspace dimensions into parameter estimation, and the one-stage variant further integrates the selection of the number of mixture components into a single algorithm. Experiments on a number of synthetic and real-world data sets show that: 1) hierarchical BIC is more accurate than BIC and several related competitors and 2) the two proposed algorithms are not only effective but also much more efficient than the classical two-stage procedure commonly used for BIC.

摘要

本文研究了概率主成分分析器（MPCA）混合物的模型选择问题。常用的贝叶斯信息准则（BIC）常用于此目的。但是，我们发现 BIC 对每个分析器的惩罚不合理，使用了整个样本大小。在本文中，我们提出了一种新的 MPCA 准则，称为层次 BIC，其中每个分析器仅使用其自己的有效样本量进行惩罚。从理论上讲，层次 BIC 是变分贝叶斯下界的大样本逼近，BIC 是层次 BIC 的进一步逼近。为了学习基于层次 BIC 的 MPCA，我们提出了两种有效的算法：两阶段和单阶段变体。两阶段算法将子空间维度的模型选择与参数估计集成在一起，而单阶段变体则将混合分量的数量选择进一步集成到单个算法中。在许多合成和真实数据集上的实验表明：1）层次 BIC 比 BIC 和几个相关的竞争对手更准确；2）所提出的两种算法不仅有效，而且比常用的基于 BIC 的经典两阶段过程效率更高。

相似文献

Efficient model selection for mixtures of probabilistic PCA via hierarchical BIC.

IEEE Trans Cybern. 2014 Oct;44(10):1871-83. doi: 10.1109/TCYB.2014.2298401.

A comparative investigation on subspace dimension determination.

Neural Netw. 2004 Oct-Nov;17(8-9):1051-9. doi: 10.1016/j.neunet.2004.07.005.

Bayesian information criterion for longitudinal and clustered data.

Stat Med. 2011 Nov 10;30(25):3050-6. doi: 10.1002/sim.4323. Epub 2011 Jul 29.

Model selection for mixtures of mutagenetic trees.

Stat Appl Genet Mol Biol. 2006;5:Article17. doi: 10.2202/1544-6115.1164. Epub 2006 Jun 23.

Modified principal component analysis: an integration of multiple similarity subspace models.

IEEE Trans Neural Netw Learn Syst. 2014 Aug;25(8):1538-52. doi: 10.1109/TNNLS.2013.2294492.

Value of sample size for computation of the Bayesian information criterion (BIC) in multilevel modeling.

Behav Res Methods. 2019 Feb;51(1):440-450. doi: 10.3758/s13428-018-1188-3.

Bilinear probabilistic principal component analysis.

IEEE Trans Neural Netw Learn Syst. 2012 Mar;23(3):492-503. doi: 10.1109/TNNLS.2012.2183006.

VARIABLE SELECTION FOR HIGH DIMENSIONAL MULTIVARIATE OUTCOMES.

Stat Sin. 2014 Oct;24(4):1633-1654. doi: 10.5705/ss.2013.019.

The effective sample size in Bayesian information criterion for level-specific fixed and random-effect selection in a two-level nested model.

Br J Math Stat Psychol. 2024 May;77(2):289-315. doi: 10.1111/bmsp.12327. Epub 2023 Dec 1.

The cross-validated AUC for MCP-logistic regression with high-dimensional data.

Stat Methods Med Res. 2013 Oct;22(5):505-18. doi: 10.1177/0962280211428385. Epub 2011 Nov 28.

引用本文的文献

Unsupervised fake news detection on social media using hybrid Gaussian Mixture Model.

PLoS One. 2025 Aug 18;20(8):e0330421. doi: 10.1371/journal.pone.0330421. eCollection 2025.

Class Enumeration and Parameter Recovery of Growth Mixture Modeling and Second-Order Growth Mixture Modeling in the Presence of Measurement Noninvariance between Latent Classes.

Front Psychol. 2017 Sep 5;8:1499. doi: 10.3389/fpsyg.2017.01499. eCollection 2017.

Dimension reduction techniques for the integrative analysis of multi-omics data.

Brief Bioinform. 2016 Jul;17(4):628-41. doi: 10.1093/bib/bbv108. Epub 2016 Mar 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过分层 BIC 对概率 PCA 的混合进行有效的模型选择。

Efficient model selection for mixtures of probabilistic PCA via hierarchical BIC.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献