Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA 02115, USA.
BMC Bioinformatics. 2011 Nov 21;12:452. doi: 10.1186/1471-2105-12-452.
Genome-wide experiments are routinely conducted to measure gene expression, DNA-protein interactions and epigenetic status. Structured metadata for these experiments is imperative for a complete understanding of experimental conditions, to enable consistent data processing and to allow retrieval, comparison, and integration of experimental results. Even though several repositories have been developed for genomics data, only a few provide annotation of samples and assays using controlled vocabularies. Moreover, many of them are tailored for a single type of technology or measurement and do not support the integration of multiple data types.
We have developed eXframe - a reusable web-based framework for genomics experiments that provides 1) the ability to publish structured data compliant with accepted standards 2) support for multiple data types including microarrays and next generation sequencing 3) query, analysis and visualization integration tools (enabled by consistent processing of the raw data and annotation of samples) and is available as open-source software. We present two case studies where this software is currently being used to build repositories of genomics experiments - one contains data from hematopoietic stem cells and another from Parkinson's disease patients.
The web-based framework eXframe offers structured annotation of experiments as well as uniform processing and storage of molecular data from microarray and next generation sequencing platforms. The framework allows users to query and integrate information across species, technologies, measurement types and experimental conditions. Our framework is reusable and freely modifiable - other groups or institutions can deploy their own custom web-based repositories based on this software. It is interoperable with the most important data formats in this domain. We hope that other groups will not only use eXframe, but also contribute their own useful modifications.
全基因组实验通常用于测量基因表达、DNA-蛋白质相互作用和表观遗传状态。这些实验的结构化元数据对于全面了解实验条件、实现一致的数据处理以及支持实验结果的检索、比较和整合是必不可少的。尽管已经开发了几个用于基因组学数据的存储库,但只有少数几个使用受控词汇表对样本和检测进行注释。此外,它们中的许多都是针对单一类型的技术或测量方法量身定制的,不支持多种数据类型的集成。
我们开发了 eXframe——一个可重复使用的基于网络的基因组学实验框架,它提供了 1)发布符合公认标准的结构化数据的能力 2)支持多种数据类型,包括微阵列和下一代测序 3)查询、分析和可视化集成工具(通过对原始数据的一致处理和样本的注释实现),并作为开源软件提供。我们介绍了两个案例研究,目前正在使用该软件构建基因组学实验存储库 - 一个包含造血干细胞数据,另一个包含帕金森病患者的数据。
基于网络的框架 eXframe 提供了实验的结构化注释,以及来自微阵列和下一代测序平台的分子数据的统一处理和存储。该框架允许用户跨物种、技术、测量类型和实验条件查询和整合信息。我们的框架是可重复使用的,并且可以自由修改 - 其他组或机构可以基于此软件部署自己的定制基于网络的存储库。它与该领域最重要的数据格式具有互操作性。我们希望其他组不仅会使用 eXframe,还会为其做出自己有用的修改。