Smedley Damian, Swertz Morris A, Wolstencroft Katy, Proctor Glenn, Zouberakis Michael, Bard Jonathan, Hancock John M, Schofield Paul
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
Brief Bioinform. 2008 Nov;9(6):532-44. doi: 10.1093/bib/bbn040.
The torrent of data emerging from the application of new technologies to functional genomics and systems biology can no longer be contained within the traditional modes of data sharing and publication with the consequence that data is being deposited in, distributed across and disseminated through an increasing number of databases. The resulting fragmentation poses serious problems for the model organism community which increasingly rely on data mining and computational approaches that require gathering of data from a range of sources. In the light of these problems, the European Commission has funded a coordination action, CASIMIR (coordination and sustainability of international mouse informatics resources), with a remit to assess the technical and social aspects of database interoperability that currently prevent the full realization of the potential of data integration in mouse functional genomics. In this article, we assess the current problems with interoperability, with particular reference to mouse functional genomics, and critically review the technologies that can be deployed to overcome them. We describe a typical use-case where an investigator wishes to gather data on variation, genomic context and metabolic pathway involvement for genes discovered in a genome-wide screen. We go on to develop an automated approach involving an in silico experimental workflow tool, Taverna, using web services, BioMart and MOLGENIS technologies for data retrieval. Finally, we focus on the current impediments to adopting such an approach in a wider context, and strategies to overcome them.
新技术应用于功能基因组学和系统生物学所产生的大量数据,已无法再局限于传统的数据共享和发表模式,结果是数据被存入、分散于并通过越来越多的数据库进行传播。这种碎片化给模式生物群体带来了严重问题,因为它们越来越依赖于数据挖掘和计算方法,而这些方法需要从一系列来源收集数据。鉴于这些问题,欧盟委员会资助了一项协调行动,即CASIMIR(国际小鼠信息学资源的协调与可持续性),其职责是评估目前阻碍小鼠功能基因组学数据集成潜力充分发挥的数据库互操作性的技术和社会层面。在本文中,我们评估了互操作性方面的当前问题,特别提及小鼠功能基因组学,并批判性地审视了可用于克服这些问题的技术。我们描述了一个典型的用例,即一名研究人员希望收集关于全基因组筛选中发现的基因的变异、基因组背景和代谢途径参与情况的数据。我们接着开发了一种自动化方法,涉及一个计算机模拟实验工作流程工具Taverna,利用网络服务、BioMart和MOLGENIS技术进行数据检索。最后,我们关注在更广泛背景下采用这种方法的当前障碍以及克服这些障碍的策略。