Leach Justin M, Aban Inmaculada, Yi Nengjun
Department of Biostatistics, University of Alabama at Birmingham, School of Public Health, 1665 University Blvd, Birmingham, AL 35233, United States of America.
J Stat Plan Inference. 2022 Mar;217:141-152. doi: 10.1016/j.jspi.2021.07.010. Epub 2021 Jul 29.
Spike-and-slab priors model predictors as arising from a mixture of distributions: those that should (slab) or should not (spike) remain in the model. The spike-and-slab lasso (SSL) is a mixture of double exponentials, extending the single lasso penalty by imposing different penalties on parameters based on their inclusion probabilities. The SSL was extended to Generalized Linear Models (GLM) for application in genetics/genomics, and can handle many highly correlated predictors of a scalar outcome, but does not incorporate these relationships into variable selection. When images/spatial data are used to model a scalar outcome, relevant parameters tend to cluster spatially, and model performance may benefit from incorporating spatial structure into variable selection. We propose to incorporate spatial information by assigning intrinsic autoregressive priors to the logit prior probabilities of inclusion, which results in more similar shrinkage penalties among spatially adjacent parameters. Using MCMC to fit Bayesian models can be computationally prohibitive for large-scale data, but we fit the model by adapting a computationally efficient coordinate-descent-based EM algorithm. A simulation study and an application to Alzheimer's Disease imaging data show that incorporating spatial information can improve model fitness.
即那些应该(平板)或不应该(尖峰)保留在模型中的变量。尖峰和平板套索(SSL)是双指数的混合,通过根据参数的包含概率对其施加不同的惩罚来扩展单套索惩罚。SSL被扩展到广义线性模型(GLM)以应用于遗传学/基因组学,并且可以处理标量结果的许多高度相关的预测变量,但没有将这些关系纳入变量选择。当使用图像/空间数据对标量结果进行建模时,相关参数往往在空间上聚类,并且模型性能可能受益于将空间结构纳入变量选择。我们建议通过为包含的逻辑先验概率分配内在自回归先验来纳入空间信息,这会导致空间相邻参数之间的收缩惩罚更加相似。对于大规模数据,使用MCMC拟合贝叶斯模型在计算上可能过高,但我们通过采用基于计算效率高的坐标下降的EM算法来拟合模型。一项模拟研究以及对阿尔茨海默病成像数据的应用表明,纳入空间信息可以提高模型拟合度。