Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
Bioinformatics. 2018 Jul 1;34(13):i555-i564. doi: 10.1093/bioinformatics/bty271.
Gene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate pathway-level changes in transcriptomics experiments. For an experiment where less than seven samples per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene set enrichment score is tested against a null distribution of enrichment scores generated from permuted gene sets, where genes are randomly selected from the input experiment. Looking across a variety of biological conditions, however, genes are not randomly distributed with many showing consistent patterns of up- or down-regulation. As a result, common patterns of positively and negatively enriched gene sets are observed across experiments. Placing a single experiment into the context of a relevant set of background experiments allows us to identify both the common and experiment-specific patterns of gene set enrichment.
We compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA to characterize common patterns of positively and negatively enriched gene sets. To identify experiment-specific gene set enrichment, we developed the GSEA-InContext method that accounts for gene expression patterns within a background set of experiments to identify statistically significantly enriched gene sets. We evaluated GSEA-InContext on experiments using small molecules with known targets to show that it successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights that complement standard GSEA analysis.
GSEA-InContext implemented in Python, Supplementary results and the background expression compendium are available at: https://github.com/CostelloLab/GSEA-InContext.
基因集富集分析(GSEA)常用于分析和解释转录组学实验中协调的通路水平变化。对于每个条件下少于七个样本的实验,GSEA 采用竞争的零假设来测试显著性。基因集富集得分与从随机选择输入实验中基因的排列基因集生成的富集得分的零分布进行比较。然而,在多种生物条件下,基因并不是随机分布的,许多基因表现出一致的上调或下调模式。因此,在实验中观察到正富集和负富集基因集的常见模式。将单个实验置于相关背景实验集合的上下文中,使我们能够识别基因集富集的常见和特定于实验的模式。
我们编制了一个包含 442 个小分子转录组学实验的汇编,并使用 GSEA 来描述正富集和负富集基因集的常见模式。为了识别特定于实验的基因集富集,我们开发了 GSEA-InContext 方法,该方法考虑了背景实验集中的基因表达模式,以识别具有统计学意义的富集基因集。我们使用具有已知靶标的小分子实验评估了 GSEA-InContext,结果表明它成功地优先考虑了每个实验特有的基因集,从而提供了有价值的见解,补充了标准 GSEA 分析。
GSEA-InContext 用 Python 实现,补充结果和背景表达汇编可在 https://github.com/CostelloLab/GSEA-InContext 上获得。