IEEE Trans Neural Netw Learn Syst. 2013 Jun;24(6):964-76. doi: 10.1109/TNNLS.2013.2245341.
Expressing data as linear functions of a small number of unknown variables is a useful approach employed by several classical data analysis methods, e.g., factor analysis, principal component analysis, or latent semantic indexing. These models represent the data using the product of two factors. In practice, one important concern is how to link the learned factors to relevant quantities in the context of the application. To this end, various specialized forms of the factors have been proposed to improve interpretability. Toward developing a unified view and clarifying the statistical significance of the specialized factors, we propose a Bayesian model family. We employ exponential family distributions to specify various types of factors, which provide a unified probabilistic formulation. A Gibbs sampling procedure is constructed as a general computation routine. We verify the model by experiments, in which the proposed model is shown to be effective in both emulating existing models and motivating new model designs for particular problem settings.
将数据表示为少数未知变量的线性函数是几种经典数据分析方法(例如因子分析、主成分分析或潜在语义索引)所采用的一种有用方法。这些模型使用两个因子的乘积来表示数据。在实践中,一个重要的问题是如何将学习到的因子与应用程序上下文中的相关量联系起来。为此,已经提出了各种专门形式的因子来提高可解释性。为了发展一种统一的观点并澄清专门因子的统计显著性,我们提出了一个贝叶斯模型族。我们使用指数族分布来指定各种类型的因子,这为统一的概率公式提供了基础。构造了一个吉布斯抽样过程作为一般的计算例程。我们通过实验验证了该模型,结果表明,该模型在模拟现有模型和为特定问题设置激励新模型设计方面都非常有效。