Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland.
BMC Genomics. 2013 Jul 29;14:514. doi: 10.1186/1471-2164-14-514.
High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).
To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.
Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.
高通量组学技术,如微阵列和下一代测序(NGS),已成为生物研究中不可或缺的工具。由于许多因素,特别是需要充分利用和比较来自不同研究和/或技术平台的数据的系统集成,组学数据的计算分析和生物学解释可能会带来重大挑战。在转录组学中,当研究感兴趣的效应或对比时,鉴定差异表达基因是进一步下游计算分析(例如基因过表达/富集分析、反向工程)的起点,从而获得机制见解。因此,以系统的方式存储与一个或多个感兴趣的效应或对比(简称“对比数据”)相对应的完整基因列表及其相关的统计分析结果(差异表达、t 统计量、p 值)非常重要,以便以可比的方式提取基因集,从而有效地支持下游分析,并在长期内进一步利用数据。填补这一空白将为生物学家发现与疾病相关的生物标志物和支持理解特定生物学扰动效应(例如疾病、遗传、环境等)背后的分子机制开辟新的研究视角。
为了解决这些挑战,我们开发了 Confero,这是一个用于组学数据下游分析和生物学解释的对比数据和基因集平台。Confero 软件平台提供了以简单和标准的格式存储对比数据的功能,还提供了数据转换功能,以实现跨研究和平台数据比较,并自动提取和存储基因集,以构建新的先验知识,从而为集成和可扩展的下游计算分析工具提供支持。目前,Gene Set Enrichment Analysis(GSEA)和 Over-Representation Analysis(ORA)作为一个分析模块以及其他支持生物学解释的工具被集成到 Confero 中。Confero 是一个独立的系统,也可以与 Galaxy 集成,Galaxy 是一个开源工作流管理和数据集成系统。为了说明 Confero 平台的功能,我们使用 Bioconductor estrogen 包数据集逐步介绍了 Confero 工作流程和结果的主要方面。
Confero 提供了一个独特而灵活的平台,支持下游计算分析,促进生物学解释。该系统旨在为研究人员提供一个简单、创新和可扩展的解决方案,以可持续和可重复的方式存储和利用分析数据,从而加速基于知识的研究。Confero 的源代码可从 http://sourceforge.net/projects/confero/ 免费获取。