Department of Statistics, University of Chicago, IL 60637, USA.
BMC Bioinformatics. 2011 Sep 29;12:384. doi: 10.1186/1471-2105-12-384.
In genome-wide association studies, it is widely accepted that multilocus methods are more powerful than testing single-nucleotide polymorphisms (SNPs) one at a time. Among statistical approaches considering many predictors simultaneously, scan statistics are an effective tool for detecting susceptibility genomic regions and mapping disease genes. In this study, inspired by the idea of scan statistics, we propose a novel sliding window-based method for identifying a parsimonious subset of contiguous SNPs that best predict disease status.
Within each sliding window, we apply a forward model selection procedure using generalized ridge logistic regression for model fitness in each step. In power simulations, we compare the performance of our method with that of five other methods in current use. Averaging power over all the conditions considered, our method dominates the others. We also present two published datasets where our method is useful in causal SNP identification.
Our method can automatically combine genetic information in local genomic regions and allow for linkage disequilibrium between SNPs. It can overcome some defects of the scan statistics approach and will be very promising in genome-wide case-control association studies.
在全基因组关联研究中,多基因座方法比逐个检测单核苷酸多态性 (SNP) 更为有效。在同时考虑多个预测因子的统计方法中,扫描统计是检测易感基因组区域和映射疾病基因的有效工具。在这项研究中,受扫描统计思想的启发,我们提出了一种新的基于滑动窗口的方法,用于识别最佳预测疾病状态的连续 SNP 的简约子集。
在每个滑动窗口内,我们在每个步骤中应用广义脊岭回归进行正向模型选择过程,以评估模型拟合度。在功效模拟中,我们将我们的方法与目前使用的其他五种方法的性能进行了比较。在考虑的所有条件下,我们的方法平均功效优于其他方法。我们还展示了两个已发表的数据集,其中我们的方法可用于识别因果 SNP。
我们的方法可以自动组合局部基因组区域中的遗传信息,并允许 SNP 之间存在连锁不平衡。它可以克服扫描统计方法的一些缺陷,在全基因组病例对照关联研究中具有很大的潜力。