Yi Nengjun, Ma Shuangge
University of Alabama, Birmingham, AL, USA.
Stat Appl Genet Mol Biol. 2012 Nov 26;11(6):/j/sagmb.2012.11.issue-6/1544-6115.1803/1544-6115.1803.xml. doi: 10.1515/1544-6115.1803.
Abstract Genetic and other scientific studies routinely generate very many predictor variables, which can be naturally grouped, with predictors in the same groups being highly correlated. It is desirable to incorporate the hierarchical structure of the predictor variables into generalized linear models for simultaneous variable selection and coefficient estimation. We propose two prior distributions: hierarchical Cauchy and double-exponential distributions, on coefficients in generalized linear models. The hierarchical priors include both variable-specific and group-specific tuning parameters, thereby not only adopting different shrinkage for different coefficients and different groups but also providing a way to pool the information within groups. We fit generalized linear models with the proposed hierarchical priors by incorporating flexible expectation-maximization (EM) algorithms into the standard iteratively weighted least squares as implemented in the general statistical package R. The methods are illustrated with data from an experiment to identify genetic polymorphisms for survival of mice following infection with Listeria monocytogenes. The performance of the proposed procedures is further assessed via simulation studies. The methods are implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
摘要 遗传学和其他科学研究经常会产生大量的预测变量,这些变量可以自然地分组,同一组中的预测变量高度相关。将预测变量的层次结构纳入广义线性模型以进行同时变量选择和系数估计是很有必要的。我们针对广义线性模型中的系数提出了两种先验分布:层次柯西分布和双指数分布。层次先验包括特定变量和特定组的调整参数,从而不仅对不同系数和不同组采用不同的收缩,还提供了一种在组内汇总信息的方法。我们通过将灵活的期望最大化(EM)算法纳入通用统计软件包R中实现的标准迭代加权最小二乘法,来拟合具有所提出的层次先验的广义线性模型。通过一项旨在识别感染单核细胞增生李斯特菌后小鼠存活的基因多态性的实验数据对这些方法进行了说明。通过模拟研究进一步评估了所提出程序的性能。这些方法在一个免费的R包BhGLM(http://www.ssg.uab.edu/bhglm/)中实现。