Lee Jaehoon, Ahn Soyeon, Oh Sohee, Weir Bruce, Park Taesung
Department of Statistics, Seoul National University, San 56-1, Shilim-dong, Seoul, Korea.
BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S11. doi: 10.1186/1752-0509-5-S2-S11. Epub 2011 Dec 14.
The current genome-wide association (GWA) analysis mainly focuses on the single genetic variant, which may not reveal some the genetic variants that have small individual effects but large joint effects. Considering the multiple SNPs jointly in Genome-wide association (GWA) analysis can increase power. When multiple SNPs are jointly considered, the corresponding SNP-level association measures are likely to be correlated due to the linkage disequilibrium (LD) among SNPs.
We propose SNP-based parametric robust analysis of gene-set enrichment (SNP-PRAGE) method which handles correlation adequately among association measures of SNPs, and minimizes computing effort by the parametric assumption. SNP-PRAGE first obtains gene-level association measures from SNP-level association measures by incorporating the size of corresponding (or nearby) genes and the LD structure among SNPs. Afterward, SNP-PRAGE acquires the gene-set level summary of genes that undergo the same biological knowledge. This two-step summarization makes the within-set association measures to be independent from each other, and therefore the central limit theorem can be adequately applied for the parametric model.
RESULTS & CONCLUSIONS: We applied SNP-PRAGE to two GWA data sets: hypertension data of 8,842 samples from the Korean population and bipolar disorder data of 4,806 samples from the Wellcome Trust Case Control Consortium (WTCCC). We found two enriched gene sets for hypertension and three enriched gene sets for bipolar disorder. By a simulation study, we compared our method to other gene set methods, and we found SNP-PRAGE reduced many false positives notably while requiring much less computational efforts than other permutation-based gene set approaches.
当前的全基因组关联(GWA)分析主要聚焦于单个基因变异,这可能无法揭示一些个体效应小但联合效应大的基因变异。在全基因组关联分析中联合考虑多个单核苷酸多态性(SNP)可提高检验效能。当联合考虑多个SNP时,由于SNP之间的连锁不平衡(LD),相应的SNP水平关联度量可能会相关。
我们提出了基于SNP的基因集富集参数稳健分析(SNP-PRAGE)方法,该方法能充分处理SNP关联度量之间的相关性,并通过参数假设将计算量降至最低。SNP-PRAGE首先通过纳入相应(或附近)基因的大小以及SNP之间的LD结构,从SNP水平关联度量中获得基因水平关联度量。之后,SNP-PRAGE获取具有相同生物学知识的基因的基因集水平汇总。这种两步汇总使得集合内的关联度量相互独立,因此中心极限定理可充分应用于参数模型。
我们将SNP-PRAGE应用于两个GWA数据集:来自韩国人群的8842个样本的高血压数据和来自威康信托病例对照协会(WTCCC)的4806个样本的双相情感障碍数据。我们发现了两个高血压富集基因集和三个双相情感障碍富集基因集。通过模拟研究,我们将我们的方法与其他基因集方法进行了比较,发现SNP-PRAGE显著减少了许多假阳性,同时与其他基于置换的基因集方法相比,所需的计算量要少得多。