Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA.
Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Am J Hum Genet. 2021 Apr 1;108(4):669-681. doi: 10.1016/j.ajhg.2021.02.016. Epub 2021 Mar 16.
Tests of association between a phenotype and a set of genes in a biological pathway can provide insights into the genetic architecture of complex phenotypes beyond those obtained from single-variant or single-gene association analysis. However, most existing gene set tests have limited power to detect gene set-phenotype association when a small fraction of the genes are associated with the phenotype and cannot identify the potentially "active" genes that might drive a gene set-based association. To address these issues, we have developed Gene set analysis Association Using Sparse Signals (GAUSS), a method for gene set association analysis that requires only GWAS summary statistics. For each significantly associated gene set, GAUSS identifies the subset of genes that have the maximal evidence of association and can best account for the gene set association. Using pre-computed correlation structure among test statistics from a reference panel, our p value calculation is substantially faster than other permutation- or simulation-based approaches. In simulations with varying proportions of causal genes, we find that GAUSS effectively controls type 1 error rate and has greater power than several existing methods, particularly when a small proportion of genes account for the gene set signal. Using GAUSS, we analyzed UK Biobank GWAS summary statistics for 10,679 gene sets and 1,403 binary phenotypes. We found that GAUSS is scalable and identified 13,466 phenotype and gene set association pairs. Within these gene sets, we identify an average of 17.2 (max = 405) genes that underlie these gene set associations.
测试表型与生物途径中一组基因之间的关联可以提供超出单变体或单基因关联分析的复杂表型遗传结构的见解。然而,当一小部分基因与表型相关联时,大多数现有的基因集测试对检测基因集-表型关联的能力有限,并且无法识别可能驱动基于基因集的关联的潜在“活跃”基因。为了解决这些问题,我们开发了 Gene set analysis Association Using Sparse Signals(GAUSS),这是一种仅需要 GWAS 汇总统计信息的基因集关联分析方法。对于每个显著相关的基因集,GAUSS 确定具有最大关联证据并能最好地解释基因集关联的基因子集。通过使用参考面板中的测试统计数据的预计算相关性结构,我们的 p 值计算速度明显快于其他基于置换或模拟的方法。在具有不同比例因果基因的模拟中,我们发现 GAUSS 有效地控制了 1 型错误率,并且比几种现有方法具有更大的功效,尤其是当一小部分基因解释了基因集信号时。使用 GAUSS,我们分析了 UK Biobank GWAS 汇总统计信息中的 10679 个基因集和 1403 个二进制表型。我们发现 GAUSS 是可扩展的,并确定了 13466 个表型和基因集关联对。在这些基因集中,我们平均确定了 17.2 个(最大=405 个)基因,这些基因是这些基因集关联的基础。