Research Program on Biomedical Informatics, Hospital del Mar Medical Research Institute, Barcelona, Catalonia, Spain.
BMC Bioinformatics. 2013 Jan 16;14:7. doi: 10.1186/1471-2105-14-7.
Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets.
To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments.
GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.
基因集富集(GSE)分析是一种将基因表达谱信息浓缩为途径或特征摘要的流行框架。与单个基因分析相比,这种方法的优势包括降噪和降维,以及更大的生物学可解释性。随着分子分析实验超越简单的病例对照研究,需要稳健且灵活的 GSE 方法来对高度异质数据集内的途径活性进行建模。
为了解决这一挑战,我们引入了基因集变异分析(GSVA),这是一种 GSE 方法,能够以无监督的方式估计样本群体中途径活性的变化。我们通过与当前最先进的样本富集方法进行比较,证明了 GSVA 的稳健性。此外,我们还提供了其在差异途径活性和生存分析中的应用示例。最后,我们展示了 GSVA 如何与微阵列和 RNA-seq 实验数据类似地工作。
与相应的方法相比,GSVA 提供了在样本群体中检测细微途径活性变化的能力。虽然 GSE 方法通常被视为生物信息学分析的终点,但 GSVA 构成了构建以途径为中心的生物学模型的起点。此外,GSVA 有助于满足当前对 RNA-seq 数据的 GSE 方法的需求。GSVA 是一个用于 R 的开源软件包,它是 Bioconductor 项目的一部分,可以在 http://www.bioconductor.org 下载。