Armagan Artin, Dunson David
Department of Statistical Science, Duke University, Durham, NC 27708.
Stat Probab Lett. 2011 Aug 1;81(8):1056-1062. doi: 10.1016/j.spl.2011.02.029.
It is increasingly common to be faced with longitudinal or multi-level data sets that have large numbers of predictors and/or a large sample size. Current methods of fitting and inference for mixed effects models tend to perform poorly in such settings. When there are many variables, it is appealing to allow uncertainty in subset selection and to obtain a sparse characterization of the data. Bayesian methods are available to address these goals using Markov chain Monte Carlo (MCMC), but MCMC is very computationally expensive and can be infeasible in large p and/or large n problems. As a fast approximate Bayes solution, we recommend a novel approximation to the posterior relying on variational methods. Variational methods are used to approximate the posterior of the parameters in a decomposition of the variance components, with priors chosen to obtain a sparse solution that allows selection of random effects. The method is evaluated through a simulation study, and applied to an epidemiological application.
面对具有大量预测变量和/或大样本量的纵向或多层次数据集越来越常见。当前混合效应模型的拟合和推断方法在这种情况下往往表现不佳。当存在许多变量时,在子集选择中考虑不确定性并获得数据的稀疏特征很有吸引力。贝叶斯方法可通过马尔可夫链蒙特卡罗(MCMC)来实现这些目标,但MCMC计算成本非常高,在大p和/或大n问题中可能不可行。作为一种快速近似贝叶斯解决方案,我们推荐一种依赖变分方法对后验进行的新颖近似。变分方法用于在方差分量分解中近似参数的后验,选择先验以获得允许选择随机效应的稀疏解。该方法通过模拟研究进行评估,并应用于一项流行病学应用中。