贝叶斯广义线性混合模型中的变量选择：以候选基因病例对照关联研究为例

Variable selection in Bayesian generalized linear-mixed models: an illustration using candidate gene case-control association studies.

作者信息

Tsai Miao-Yu

机构信息

Institute of Statistics and Information Science, National Changhua University of Education, Changhua, 500, Taiwan.

出版信息

Biom J. 2015 Mar;57(2):234-53. doi: 10.1002/bimj.201300259. Epub 2014 Sep 30.

DOI:10.1002/bimj.201300259

PMID:25267186

Abstract

The problem of variable selection in the generalized linear-mixed models (GLMMs) is pervasive in statistical practice. For the purpose of variable selection, many methodologies for determining the best subset of explanatory variables currently exist according to the model complexity and differences between applications. In this paper, we develop a "higher posterior probability model with bootstrap" (HPMB) approach to select explanatory variables without fitting all possible GLMMs involving a small or moderate number of explanatory variables. Furthermore, to save computational load, we propose an efficient approximation approach with Laplace's method and Taylor's expansion to approximate intractable integrals in GLMMs. Simulation studies and an application of HapMap data provide evidence that this selection approach is computationally feasible and reliable for exploring true candidate genes and gene-gene associations, after adjusting for complex structures among clusters.

摘要

广义线性混合模型（GLMMs）中的变量选择问题在统计实践中普遍存在。出于变量选择的目的，目前根据模型复杂性和应用之间的差异，存在许多用于确定解释变量最佳子集的方法。在本文中，我们开发了一种“带自助法的更高后验概率模型”（HPMB）方法，用于在不拟合所有包含少量或中等数量解释变量的可能GLMMs的情况下选择解释变量。此外，为了节省计算量，我们提出了一种使用拉普拉斯方法和泰勒展开的有效近似方法，以近似GLMMs中难以处理的积分。模拟研究和HapMap数据的应用提供了证据，表明在调整聚类间的复杂结构后，这种选择方法在探索真正的候选基因和基因-基因关联方面在计算上是可行且可靠的。