IEEE Trans Neural Netw Learn Syst. 2013 Mar;24(3):485-97. doi: 10.1109/TNNLS.2012.2234134.
Principal component analysis (PCA) is a widely used model for dimensionality reduction. In this paper, we address the problem of determining the intrinsic dimensionality of a general type data population by selecting the number of principal components for a generalized PCA model. In particular, we propose a generalized Bayesian PCA model, which deals with general type data by employing exponential family distributions. Model selection is realized by empirical Bayesian inference of the model. We name the model as simple exponential family PCA (SePCA), since it embraces both the principal of using a simple model for data representation and the practice of using a simplified computational procedure for the inference. Our analysis shows that the empirical Bayesian inference in SePCA formally realizes an intuitive criterion for PCA model selection - a preserved principal component must sufficiently correlate to data variance that is uncorrelated to the other principal components. Experiments on synthetic and real data sets demonstrate effectiveness of SePCA and exemplify its characteristics for model selection.
主成分分析(PCA)是一种广泛使用的降维模型。在本文中,我们通过为广义 PCA 模型选择主成分的数量来解决一般类型数据群体的内在维数确定问题。具体来说,我们提出了一种广义贝叶斯 PCA 模型,通过使用指数族分布来处理一般类型的数据。通过对模型的经验贝叶斯推断来实现模型选择。我们将该模型命名为简单指数族 PCA(SePCA),因为它既包含了使用简单模型表示数据的原理,又采用了简化的计算程序进行推理。我们的分析表明,SePCA 中的经验贝叶斯推断正式实现了 PCA 模型选择的直观标准 - 保留的主成分必须与其他主成分不相关的数据方差有足够的相关性。对合成和真实数据集的实验证明了 SePCA 的有效性,并举例说明了其用于模型选择的特点。