Dunson David B, Xing Chuanhua
Department of Statistical Science, Duke University, Durham, NC 27705.
J Am Stat Assoc. 2012 Jan 1;104(487):1042-1051. doi: 10.1198/jasa.2009.tm08439.
Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables. This support condition ensures that we are not restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation. Methods for nonparametric testing of violations of independence are proposed, and the methods are applied to model positional dependence within transcription factor binding motifs.
多元无序分类(名义)数据的建模是一个具有挑战性的问题,特别是在高维情况下以及希望避免对依赖结构做出强假设的情形中。常用方法依赖于纳入潜在高斯随机变量或参数化潜在类别模型。本文的目标是开发一种非参数贝叶斯方法,该方法在多个无序分类变量的分布空间上定义一个具有完全支撑的先验。这个支撑条件确保我们不会先验地限制依赖结构。我们表明这可以通过乘积多项分布的狄利克雷过程混合来实现,这对于后验计算也是一种方便的形式。提出了用于独立性违反的非参数检验方法,并将这些方法应用于转录因子结合基序内的位置依赖性建模。