Johndrow James E, Bhattacharya Anirban, Dunson David B
Duke University.
Texas A&M University.
Ann Stat. 2017;45(1):1-38. doi: 10.1214/15-AOS1414. Epub 2017 Feb 21.
Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
列联表分析通常依赖于对数线性模型,潜在结构分析则提供了一种常见的替代方法。潜在结构模型导致多变量分类数据的概率质量函数的秩降低张量分解,而对数线性模型通过稀疏性实现降维。对于这两种范式中这些降维概念之间的关系,人们知之甚少。我们得出了几个将对数线性模型的支撑与相关概率张量的非负秩联系起来的结果。受这些发现的启发,我们提出了一种新的折叠塔克张量分解类,它连接了现有的PARAFAC和塔克分解,为简洁地表征多变量分类数据提供了一个更灵活的框架。采用贝叶斯推理方法,我们展示了新分解的实证优势。