Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA.
Bioinformatics. 2011 Mar 1;27(5):686-92. doi: 10.1093/bioinformatics/btq728. Epub 2011 Jan 25.
In genome-wide association studies (GWAS) of complex diseases, genetic variants having real but weak associations often fail to be detected at the stringent genome-wide significance level. Pathway analysis, which tests disease association with combined association signals from a group of variants in the same pathway, has become increasingly popular. However, because of the complexities in genetic data and the large sample sizes in typical GWAS, pathway analysis remains to be challenging. We propose a new statistical model for pathway analysis of GWAS. This model includes a fixed effects component that models mean disease association for a group of genes, and a random effects component that models how each gene's association with disease varies about the gene group mean, thus belongs to the class of mixed effects models.
The proposed model is computationally efficient and uses only summary statistics. In addition, it corrects for the presence of overlapping genes and linkage disequilibrium (LD). Via simulated and real GWAS data, we showed our model improved power over currently available pathway analysis methods while preserving type I error rate. Furthermore, using the WTCCC Type 1 Diabetes (T1D) dataset, we demonstrated mixed model analysis identified meaningful biological processes that agreed well with previous reports on T1D. Therefore, the proposed methodology provides an efficient statistical modeling framework for systems analysis of GWAS.
The software code for mixed models analysis is freely available at http://biostat.mc.vanderbilt.edu/LilyWang.
在复杂疾病的全基因组关联研究(GWAS)中,具有真实但微弱关联的遗传变异通常无法在严格的全基因组显著性水平下被检测到。途径分析是一种越来越受欢迎的方法,它检验了疾病与同一途径中一组变异的联合关联信号的相关性。然而,由于遗传数据的复杂性和典型 GWAS 中的大样本量,途径分析仍然具有挑战性。我们提出了一种新的 GWAS 途径分析的统计模型。该模型包括一个固定效应组件,用于对一组基因的疾病平均关联进行建模;以及一个随机效应组件,用于对每个基因与疾病的关联如何围绕基因组平均值变化进行建模,因此属于混合效应模型的范畴。
所提出的模型计算效率高,仅使用汇总统计信息。此外,它还纠正了重叠基因和连锁不平衡(LD)的存在。通过模拟和真实的 GWAS 数据,我们表明,我们的模型在保持Ⅰ型错误率的同时,提高了现有途径分析方法的功效。此外,使用 WTCCC 1 型糖尿病(T1D)数据集,我们证明了混合模型分析确定了有意义的生物学过程,与之前关于 T1D 的报告一致。因此,所提出的方法为 GWAS 的系统分析提供了一种有效的统计建模框架。
混合模型分析的软件代码可在 http://biostat.mc.vanderbilt.edu/LilyWang 上免费获取。