Boyle John, Rovira Hector, Cavnor Chris, Burdick David, Killcoyne Sarah, Shmulevich Ilya
Institute for Systems Biology, 1441 N 34th Street, Seattle, WA 98103, USA.
BMC Bioinformatics. 2009 Mar 6;10:79. doi: 10.1186/1471-2105-10-79.
Within research each experiment is different, the focus changes and the data is generated from a continually evolving barrage of technologies. There is a continual introduction of new techniques whose usage ranges from in-house protocols through to high-throughput instrumentation. To support these requirements data management systems are needed that can be rapidly built and readily adapted for new usage.
The adaptable data management system discussed is designed to support the seamless mining and analysis of biological experiment data that is commonly used in systems biology (e.g. ChIP-chip, gene expression, proteomics, imaging, flow cytometry). We use different content graphs to represent different views upon the data. These views are designed for different roles: equipment specific views are used to gather instrumentation information; data processing oriented views are provided to enable the rapid development of analysis applications; and research project specific views are used to organize information for individual research experiments. This management system allows for both the rapid introduction of new types of information and the evolution of the knowledge it represents.
Data management is an important aspect of any research enterprise. It is the foundation on which most applications are built, and must be easily extended to serve new functionality for new scientific areas. We have found that adopting a three-tier architecture for data management, built around distributed standardized content repositories, allows us to rapidly develop new applications to support a diverse user community.
在研究领域,每个实验都不尽相同,研究重点不断变化,数据则源自持续发展的大量技术。新技术不断涌现,其应用范围涵盖从内部协议到高通量仪器设备。为满足这些需求,需要能够快速构建并易于适应新用途的数据管理系统。
所讨论的可适应数据管理系统旨在支持对系统生物学中常用的生物实验数据(如芯片杂交、基因表达、蛋白质组学、成像、流式细胞术)进行无缝挖掘和分析。我们使用不同的内容图来呈现数据的不同视图。这些视图针对不同角色进行设计:特定设备视图用于收集仪器信息;面向数据处理的视图用于促进分析应用的快速开发;特定研究项目视图用于为单个研究实验组织信息。该管理系统既允许快速引入新型信息,也允许其所代表的知识不断演进。
数据管理是任何研究事业的重要方面。它是大多数应用构建的基础,并且必须易于扩展以服务于新科学领域的新功能。我们发现,围绕分布式标准化内容存储库采用三层数据管理架构,使我们能够快速开发新应用以支持多样化的用户群体。