Silver Matt, Montana Giovanni
Imperial College London, UK.
Stat Appl Genet Mol Biol. 2012 Jan 6;11(1):Article 7. doi: 10.2202/1544-6115.1755.
Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways.We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our "pathways group lasso with adaptive weights" (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets.In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small.
当因果单核苷酸多态性(SNPs)倾向于在生物通路中聚集时,将先前的通路信息纳入统计模型有望提高基因关联研究中检测真实关联的能力。大多数现有的基于通路的方法依赖于边际SNP统计,并未充分利用通路内SNP之间的依赖模式。我们使用一种稀疏回归模型,将SNPs按通路分组,以识别与数量性状相关的因果通路。我们的“带自适应权重的通路组套索”(P-GLAW)算法的显著特点包括在单个回归模型中纳入所有通路、一种考虑影响通路选择因素的自适应通路加权程序,以及使用自举抽样程序对重要通路进行排序。P-GLAW考虑了重叠通路的存在,并使用一种新颖的技术组合来优化模型估计,即使在全基因组数据集上运行也很快。在与基于单变量SNP统计的另一种通路方法的比较研究中,我们的方法在检测重要通路上表现出高灵敏度和特异性,在边际SNP效应大小较小时性能提升最为显著。