Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA 98195, USA.
Bioinformatics. 2012 Jul 1;28(13):1738-44. doi: 10.1093/bioinformatics/bts261. Epub 2012 May 4.
For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm.
We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort.
An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html.
对于许多复杂特征,包括身高,全基因组关联研究(GWAS)确定的大多数变体具有较小的影响,这使得可遗传变异的很大一部分仍未得到解释。尽管已经提出了许多惩罚性多重回归方法来提高检测复杂遗传结构关联的能力,但它们通常缺乏控制假阳性和模型过拟合的机制。我们的方法是第一个明确控制 I 型错误率的惩罚性多重回归方法,并通过基于变分贝叶斯尖峰回归算法为 GWAS 中的每个标记定义的新正态分布统计量提供模型过拟合诊断。
我们将我们的方法与套索和单标记分析在模拟数据上进行了比较,并证明我们的方法在功效和 I 型错误控制方面具有优越的性能。此外,使用妇女健康倡议(WHI)SNP 健康关联资源(SHARe)GWAS 对非裔美国人进行分析,我们表明我们的方法具有检测身体高度的附加新颖关联的能力。这些发现通过在更大的队列中达到边际关联的严格截止值得到了复制。
包括我们的变分贝叶斯尖峰回归(vBsr)算法实现的 R 包可在 http://kooperberg.fhcrc.org/soft.html 获得。