Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA.
Bioinformatics. 2011 Jan 1;27(1):1-8. doi: 10.1093/bioinformatics/btq600. Epub 2010 Oct 29.
Genome-wide association studies (GWAS) involving half a million or more single nucleotide polymorphisms (SNPs) allow genetic dissection of complex diseases in a holistic manner. The common practice of analyzing one SNP at a time does not fully realize the potential of GWAS to identify multiple causal variants and to predict risk of disease. Existing methods for joint analysis of GWAS data tend to miss causal SNPs that are marginally uncorrelated with disease and have high false discovery rates (FDRs).
We introduce GWASelect, a statistically powerful and computationally efficient variable selection method designed to tackle the unique challenges of GWAS data. This method searches iteratively over the potential SNPs conditional on previously selected SNPs and is thus capable of capturing causal SNPs that are marginally correlated with disease as well as those that are marginally uncorrelated with disease. A special resampling mechanism is built into the method to reduce false positive findings. Simulation studies demonstrate that the GWASelect performs well under a wide spectrum of linkage disequilibrium patterns and can be substantially more powerful than existing methods in capturing causal variants while having a lower FDR. In addition, the regression models based on the GWASelect tend to yield more accurate prediction of disease risk than existing methods. The advantages of the GWASelect are illustrated with the Wellcome Trust Case-Control Consortium (WTCCC) data.
The software implementing GWASelect is available at http://www.bios.unc.edu/~lin. Access to WTCCC data: http://www.wtccc.org.uk/.
全基因组关联研究(GWAS)涉及五十万或更多的单核苷酸多态性(SNPs),可以全面地对复杂疾病进行基因剖析。一次分析一个 SNP 的常见做法并没有充分发挥 GWAS 识别多个因果变异和预测疾病风险的潜力。现有的联合分析 GWAS 数据的方法往往会错过与疾病呈轻微不相关且具有高假发现率(FDR)的因果 SNPs。
我们引入了 GWASelect,这是一种统计强大且计算高效的变量选择方法,旨在解决 GWAS 数据的独特挑战。该方法在先前选择的 SNPs 的条件下迭代地搜索潜在的 SNPs,因此能够捕获与疾病呈轻微相关的因果 SNPs 以及与疾病呈轻微不相关的因果 SNPs。该方法内置了一种特殊的重采样机制,以减少假阳性发现。模拟研究表明,GWASelect 在广泛的连锁不平衡模式下表现良好,并且在捕获因果变异方面比现有的方法更强大,同时具有更低的 FDR。此外,基于 GWASelect 的回归模型往往比现有的方法更能准确预测疾病风险。GWASelect 的优势在惠康信托基金会病例对照联盟(WTCCC)的数据中得到了说明。
实现 GWASelect 的软件可在 http://www.bios.unc.edu/~lin. 上获得。WTCCC 数据的访问:http://www.wtccc.org.uk/。