Yang Yun, Dunson David B
Department of Statistical Science, Duke University, NC27708.
J Am Stat Assoc. 2016;111(514):656-669. doi: 10.1080/01621459.2015.1029129. Epub 2016 Aug 18.
In many application areas, data are collected on a categorical response and high-dimensional categorical predictors, with the goals being to build a parsimonious model for classification while doing inferences on the important predictors. In settings such as genomics, there can be complex interactions among the predictors. By using a carefully-structured Tucker factorization, we define a model that can characterize any conditional probability, while facilitating variable selection and modeling of higher-order interactions. Following a Bayesian approach, we propose a Markov chain Monte Carlo algorithm for posterior computation accommodating uncertainty in the predictors to be included. Under near low rank assumptions, the posterior distribution for the conditional probability is shown to achieve close to the parametric rate of contraction even in ultra high-dimensional settings. The methods are illustrated using simulation examples and biomedical applications.
在许多应用领域,会收集关于分类响应和高维分类预测变量的数据,目标是构建一个简约的分类模型,同时对重要的预测变量进行推断。在基因组学等领域,预测变量之间可能存在复杂的相互作用。通过使用精心构建的塔克分解,我们定义了一个模型,该模型可以表征任何条件概率,同时便于变量选择和高阶相互作用的建模。遵循贝叶斯方法,我们提出了一种马尔可夫链蒙特卡罗算法用于后验计算,以适应要纳入的预测变量中的不确定性。在近似低秩假设下,即使在超高维环境中,条件概率的后验分布也显示出接近参数收缩率。通过模拟示例和生物医学应用对这些方法进行了说明。