Subramanian Aravind, Tamayo Pablo, Mootha Vamsi K, Mukherjee Sayan, Ebert Benjamin L, Gillette Michael A, Paulovich Amanda, Pomeroy Scott L, Golub Todd R, Lander Eric S, Mesirov Jill P
Broad Institute of Massachusetts Institute of Technology and Harvard, 320 Charles Street, Cambridge, MA 02141, USA.
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.
Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
尽管全基因组RNA表达分析已成为生物医学研究中的常规工具,但从这些信息中提取生物学见解仍然是一项重大挑战。在此,我们描述了一种名为基因集富集分析(GSEA)的强大分析方法,用于解释基因表达数据。该方法通过关注基因集,即共享共同生物学功能、染色体位置或调控的基因组来获得其强大功能。我们展示了GSEA如何在包括白血病和肺癌在内的几个癌症相关数据集中产生见解。值得注意的是,在单基因分析未发现肺癌患者生存的两项独立研究之间有多少相似性的情况下,GSEA揭示了许多共同的生物学途径。GSEA方法体现在一个免费的软件包中,以及一个包含1325个生物学定义的基因集的初始数据库。