Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Am J Hum Genet. 2010 Jun 11;86(6):929-42. doi: 10.1016/j.ajhg.2010.05.002.
GWAS have emerged as popular tools for identifying genetic variants that are associated with disease risk. Standard analysis of a case-control GWAS involves assessing the association between each individual genotyped SNP and disease risk. However, this approach suffers from limited reproducibility and difficulties in detecting multi-SNP and epistatic effects. As an alternative analytical strategy, we propose grouping SNPs together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set. Testing of each SNP set proceeds via the logistic kernel-machine-based test, which is based on a statistical framework that allows for flexible modeling of epistatic and nonlinear SNP effects. This flexibility and the ability to naturally adjust for covariate effects are important features of our test that make it appealing in comparison to individual SNP tests and existing multimarker tests. Using simulated data based on the International HapMap Project, we show that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings. In particular, we find that our approach has higher power than individual-SNP analysis when the median correlation between the disease-susceptibility variant and the genotyped SNPs is moderate to high. When the correlation is low, both individual-SNP analysis and the SNP-set analysis tend to have low power. We apply SNP-set analysis to analyze the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer GWAS discovery-phase data.
GWAS 已成为识别与疾病风险相关的遗传变异的流行工具。一项病例对照 GWAS 的标准分析涉及评估每个个体基因分型 SNP 与疾病风险之间的关联。然而,这种方法存在可重复性有限和难以检测多 SNP 和上位效应的问题。作为一种替代的分析策略,我们提议根据基因或单倍型块等基因组特征将 SNPs 分组到 SNP 集中,然后测试每个 SNP 集的联合效应。每个 SNP 集的测试通过基于逻辑核机器的测试进行,该测试基于允许灵活建模上位和非线性 SNP 效应的统计框架。这种灵活性和对协变量效应进行自然调整的能力是我们的测试的重要特征,使其与个体 SNP 测试和现有的多标记测试相比具有吸引力。使用基于国际人类基因组单体型计划的模拟数据,我们表明 SNP 集测试在多种设置下都可以比标准的个体 SNP 分析具有更高的功效。特别是,我们发现当疾病易感性变体与基因分型 SNP 之间的中位数相关性从中等到高时,我们的方法比个体 SNP 分析具有更高的功效。当相关性较低时,个体 SNP 分析和 SNP 集分析都倾向于具有较低的功效。我们应用 SNP 集分析来分析癌症遗传易感标记物 (CGEMS) 乳腺癌 GWAS 发现阶段的数据。