IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):537-550. doi: 10.1109/TCBB.2015.2440244.
Ultra-high dimensional variable selection has become increasingly important in analysis of neuroimaging data. For example, in the Autism Brain Imaging Data Exchange (ABIDE) study, neuroscientists are interested in identifying important biomarkers for early detection of the autism spectrum disorder (ASD) using high resolution brain images that include hundreds of thousands voxels. However, most existing methods are not feasible for solving this problem due to their extensive computational costs. In this work, we propose a novel multiresolution variable selection procedure under a Bayesian probit regression framework. It recursively uses posterior samples for coarser-scale variable selection to guide the posterior inference on finer-scale variable selection, leading to very efficient Markov chain Monte Carlo (MCMC) algorithms. The proposed algorithms are computationally feasible for ultra-high dimensional data. Also, our model incorporates two levels of structural information into variable selection using Ising priors: the spatial dependence between voxels and the functional connectivity between anatomical brain regions. Applied to the resting state functional magnetic resonance imaging (R-fMRI) data in the ABIDE study, our methods identify voxel-level imaging biomarkers highly predictive of the ASD, which are biologically meaningful and interpretable. Extensive simulations also show that our methods achieve better performance in variable selection compared to existing methods.
超高维变量选择在神经影像学数据分析中变得越来越重要。例如,在自闭症脑成像数据交换(ABIDE)研究中,神经科学家有兴趣使用包括数十万体素的高分辨率脑图像来识别早期检测自闭症谱系障碍(ASD)的重要生物标志物。然而,由于计算成本高,大多数现有的方法都不适用于解决这个问题。在这项工作中,我们在贝叶斯概率回归框架下提出了一种新的多分辨率变量选择方法。它递归地使用后验样本进行更粗尺度的变量选择,以指导更细尺度的变量选择的后验推断,从而得到非常有效的马尔可夫链蒙特卡罗(MCMC)算法。所提出的算法对于超高维数据是可行的。此外,我们的模型使用伊辛先验将两个层次的结构信息纳入变量选择:体素之间的空间相关性和解剖大脑区域之间的功能连接。将我们的方法应用于 ABIDE 研究中的静息态功能磁共振成像(R-fMRI)数据,我们确定了具有高度预测 ASD 能力的体素水平成像生物标志物,这些标志物具有生物学意义且可解释。广泛的模拟也表明,与现有的方法相比,我们的方法在变量选择方面具有更好的性能。