Manning Maurice, Aggarwal Amit, Gao Kevin, Tucker-Kellogg Greg
Lilly Singapore Centre for Drug Discovery, 8A Biomedical Grove #02-05, Immunos, Biopolis, 138648, Singapore.
Brief Bioinform. 2009 Mar;10(2):164-76. doi: 10.1093/bib/bbp007.
Current data integration approaches by bioinformaticians frequently involve extracting data from a wide variety of public and private data repositories, each with a unique vocabulary and schema, via scripts. These separate data sets must then be normalized through the tedious and lengthy process of resolving naming differences and collecting information into a single view. Attempts to consolidate such diverse data using data warehouses or federated queries add significant complexity and have shown limitations in flexibility. The alternative of complete semantic integration of data requires a massive, sustained effort in mapping data types and maintaining ontologies. We focused instead on creating a data architecture that leverages semantic mapping of experimental metadata, to support the rapid prototyping of scientific discovery applications with the twin goals of reducing architectural complexity while still leveraging semantic technologies to provide flexibility, efficiency and more fully characterized data relationships. A metadata ontology was developed to describe our discovery process. A metadata repository was then created by mapping metadata from existing data sources into this ontology, generating RDF triples to describe the entities. Finally an interface to the repository was designed which provided not only search and browse capabilities but complex query templates that aggregate data from both RDF and RDBMS sources. We describe how this approach (i) allows scientists to discover and link relevant data across diverse data sources and (ii) provides a platform for development of integrative informatics applications.
生物信息学家当前的数据集成方法通常涉及通过脚本从各种公共和私有数据存储库中提取数据,每个存储库都有独特的词汇表和模式。然后,必须通过解决命名差异并将信息收集到单一视图的冗长乏味过程,对这些单独的数据集进行规范化处理。使用数据仓库或联合查询来整合如此多样的数据,会增加显著的复杂性,并且在灵活性方面存在局限性。完全语义集成数据的替代方法需要在映射数据类型和维护本体方面付出巨大且持续的努力。相反,我们专注于创建一种数据架构,利用实验元数据的语义映射,以支持科学发现应用的快速原型设计,实现两个目标:降低架构复杂性,同时仍利用语义技术提供灵活性、效率和更完整的数据关系特征。我们开发了一个元数据本体来描述我们的发现过程。然后通过将现有数据源中的元数据映射到该本体中,创建了一个元数据存储库,生成RDF三元组来描述实体。最后,设计了一个到该存储库的接口,它不仅提供搜索和浏览功能,还提供复杂的查询模板,可聚合来自RDF和RDBMS源的数据。我们描述了这种方法如何(i)允许科学家跨不同数据源发现和链接相关数据,以及(ii)为集成信息学应用的开发提供一个平台。
Brief Bioinform. 2009-3
Bioinformatics. 2005-6
Brief Bioinform. 2009-7
Brief Bioinform. 2009-3
J Biomed Inform. 2008-10
J Chem Inf Model. 2010-5-24
J Biomed Inform. 2008-10
J Healthc Eng. 2020
Epilepsy Behav Case Rep. 2016-3-9
Brief Bioinform. 2012-8-9
Radiographics. 2010-10-27