Bhattacharya A, Dunson D B
Department of Statistical Science, Duke University, Durham, North Carolina 27708-0251, U.S.A. ,
Biometrika. 2011 Jun;98(2):291-306. doi: 10.1093/biomet/asr013.
We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data.
我们专注于使用贝叶斯潜在因子模型对高维协方差矩阵进行稀疏建模。我们在因子载荷上提出了一种乘法伽马过程收缩先验,它允许引入无限多个因子,随着列索引的增加,载荷越来越向零收缩。我们在参数扩展的载荷矩阵上使用我们的先验来避免因子分析模型中典型的顺序依赖性,并开发了一种高效的吉布斯采样器,随着数据维度的增加,该采样器能很好地扩展。效率的提高是通过所提出先验的联合共轭性质实现的,这允许对载荷矩阵进行分块更新。我们提出了一种自适应吉布斯采样器,用于通过选择重要因子的数量来自动截断无限的载荷矩阵。给出了关于先验支持和截断近似界的理论结果。提出了一种快速算法来产生近似贝叶斯估计。开发了潜在因子回归方法,用于具有高维相关预测变量的应用中的预测和变量选择。通过模拟研究评估操作特性,并将该方法应用于从基因表达数据预测生存时间。