Ruffieux Hélène, Davison Anthony C, Hager Jörg, Inshaw Jamie, Fairfax Benjamin P, Richardson Sylvia, Bottolo Leonardo
MRC Biostatistics Unit, University of Cambridge.
Ecole Polytechnique Fédérale de Lausanne (EPFL).
Ann Appl Stat. 2020 Jun;14(2):905-928. doi: 10.1214/20-AOAS1332. Epub 2020 Jun 29.
We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting , that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional mechanisms underlying disease endpoints. Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to large predictor and response vectors, for example, of dimensions 10-10 in genetic applications. We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable to the above dimensions. Our proposal implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior. Its global-local formulation shrinks noise globally and, hence, accommodates the highly sparse nature of genetic analyses while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions.
我们处理在具有多个预测变量和多个响应变量的回归问题中进行变量选择的建模和推断。我们专注于检测,即与多个响应相关的预测变量。这项任务在统计遗传学中至关重要,因为热点遗传变异通过控制许多基因的表达塑造基因组结构,并可能引发疾病终点背后的决定性功能机制。现有的旨在对热点进行建模的分层回归方法存在两个局限性:它们对热点的判别对预测变量成为热点倾向的顶级尺度参数的选择敏感,并且它们无法扩展到大型预测变量和响应向量,例如在遗传应用中维度为10 - 10的情况。我们通过引入一个灵活的分层回归框架来解决这些缺点,该框架专门针对热点检测进行了定制,并且可以扩展到上述维度。我们的提议基于马蹄形收缩先验实现了一个用于热点的全贝叶斯模型。其全局 - 局部公式在全局范围内收缩噪声,因此,在适应遗传分析的高度稀疏性质的同时,对单个信号具有鲁棒性,从而使热点的影响不被收缩。使用快速变分算法结合一种新颖的模拟退火程序进行推断,该程序允许对多峰分布进行有效探索。