Spinelli Lionel, Carpentier Sabrina, Montañana Sanchis Frédéric, Dalod Marc, Vu Manh Thien-Phong
Centre d'Immunologie, de Marseille-Luminy, Aix Marseille University UM2, Inserm, U1104, CNRS UMR7280, F-13288, Marseille, Cedex 09, France.
Mi-mAbs (C/O CIML), F-13009, Marseille, France.
BMC Genomics. 2015 Oct 19;16:814. doi: 10.1186/s12864-015-2012-4.
Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one's biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions.
We developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes.
BubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps interpreting the results. This innovative methodology has recently been used to answer important questions in functional genomics, such as the degree of similarities between microarray datasets from different laboratories or with different experimental models or clinical cohorts. BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it. It is available at http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html .
高通量表达数据分析的最新进展促使了一些工具的开发,这些工具将重点从单基因层面扩大到了基因集层面。例如,广受欢迎的基因集富集分析(GSEA)算法能够检测出在成对实验条件之间,一组可能相关基因的适度但协同的表达变化。这极大地改善了从高通量基因表达数据中提取信息的能力。然而,尽管公共数据库中有许多涵盖大量生物学领域的基因集,但生成与自身生物学问题相关的自制基因集的能力至关重要,但对于大多数缺乏统计学或生物信息学专业知识的生物学家来说,这仍然是一个巨大的挑战。在试图定义一种特定于一种条件而非许多其他条件的基因集时,情况更是如此。因此,迫切需要一种易于使用的软件,用于从复杂数据集中生成相关的自制基因集,将其用于GSEA,并在应用于多个实验条件的多重比较时对结果进行校正。
我们开发了BubbleGUM(GSEA无限映射),这是一种工具,可从转录组数据中自动提取分子特征,并通过多重检验校正进行详尽的GSEA。BubbleGUM的一个独特功能尤其在于其能够将众多GSEA结果整合并比较为一种易于理解的图形表示形式。我们应用我们的方法为小鼠细胞类型生成转录组指纹,并评估它们在人类细胞类型中的富集情况。该分析使我们能够确认小鼠和人类免疫细胞之间的同源性。
BubbleGUM是一款开源软件,能够从复杂的表达数据集中自动生成分子特征,并通过GSEA直接评估它们在独立数据集中的富集情况。富集情况以图形输出显示,有助于解释结果。这种创新方法最近被用于回答功能基因组学中的重要问题,例如来自不同实验室、不同实验模型或临床队列的微阵列数据集之间的相似程度。BubbleGUM可通过直观的界面执行,因此生物信息学家和生物学家都可以使用它。可在http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html获取。