Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.
Biostatistics. 2010 Jan;11(1):139-50. doi: 10.1093/biostatistics/kxp043. Epub 2009 Oct 12.
Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single single nucleotide polymorphism (SNP) analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferroni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. In this paper, we propose a hidden Markov random field model (HMRF) for GWAS analysis based on a weighted LD graph built from the prior LD information among the SNPs and an efficient iterative conditional mode algorithm for estimating the model parameters. This model effectively utilizes the LD information in calculating the posterior probability that an SNP is associated with the disease. These posterior probabilities can then be used to define a false discovery controlling procedure in order to select the disease-associated SNPs. Simulation studies demonstrated the potential gain in power over single SNP analysis. The proposed method is especially effective in identifying SNPs with borderline significance at the single-marker level that nonetheless are in high LD with significant SNPs. In addition, by simultaneously considering the SNPs in LD, the proposed method can also help to reduce the number of false identifications of disease-associated SNPs. We demonstrate the application of the proposed HMRF model using data from a case-control GWAS of neuroblastoma and identify 1 new SNP that is potentially associated with neuroblastoma.
全基因组关联研究(GWAS)越来越多地用于识别复杂性状的新型易感遗传变异,但对于此类数据的分析方法尚未达成共识。最常用的方法包括单核苷酸多态性(SNP)分析或单体型分析,并对多重比较进行 Bonferroni 校正。由于典型 GWAS 中的 SNPs 通常处于连锁不平衡(LD)状态,至少在局部区域,多重比较的 Bonferroni 校正通常会导致保守的误差控制,从而降低统计功效。在本文中,我们提出了一种基于从 SNPs 之间的先验 LD 信息构建的加权 LD 图的 GWAS 分析隐马尔可夫随机场模型(HMRF),以及用于估计模型参数的高效迭代条件模式算法。该模型有效地利用了 LD 信息来计算 SNP 与疾病相关的后验概率。这些后验概率可用于定义错误发现控制程序,以选择与疾病相关的 SNPs。模拟研究表明,该方法在单 SNP 分析方面具有潜在的功效增益。该方法在识别单标记水平上具有边缘意义但与显著 SNPs 高度 LD 的 SNPs 方面特别有效。此外,通过同时考虑 LD 中的 SNPs,该方法还可以帮助减少假阳性识别与疾病相关的 SNPs 的数量。我们使用神经母细胞瘤病例对照 GWAS 的数据展示了所提出的 HMRF 模型的应用,并鉴定出 1 个可能与神经母细胞瘤相关的新 SNP。