Callahan Alison, Cruz-Toledo José, Dumontier Michel
Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, ON, Canada.
J Biomed Semantics. 2013 Apr 15;4 Suppl 1(Suppl 1):S1. doi: 10.1186/2041-1480-4-S1-S1.
A key activity for life scientists in this post "-omics" age involves searching for and integrating biological data from a multitude of independent databases. However, our ability to find relevant data is hampered by non-standard web and database interfaces backed by an enormous variety of data formats. This heterogeneity presents an overwhelming barrier to the discovery and reuse of resources which have been developed at great public expense.To address this issue, the open-source Bio2RDF project promotes a simple convention to integrate diverse biological data using Semantic Web technologies. However, querying Bio2RDF remains difficult due to the lack of uniformity in the representation of Bio2RDF datasets.
We describe an update to Bio2RDF that includes tighter integration across 19 new and updated RDF datasets. All available open-source scripts were first consolidated to a single GitHub repository and then redeveloped using a common API that generates normalized IRIs using a centralized dataset registry. We then mapped dataset specific types and relations to the Semanticscience Integrated Ontology (SIO) and demonstrate simplified federated queries across multiple Bio2RDF endpoints.
This coordinated release marks an important milestone for the Bio2RDF open source linked data framework. Principally, it improves the quality of linked data in the Bio2RDF network and makes it easier to access or recreate the linked data locally. We hope to continue improving the Bio2RDF network of linked data by identifying priority databases and increasing the vocabulary coverage to additional dataset vocabularies beyond SIO.
在这个“后组学”时代,生命科学家的一项关键活动是从众多独立数据库中搜索和整合生物数据。然而,我们查找相关数据的能力受到非标准网络和数据库接口的阻碍,这些接口背后是各种各样的数据格式。这种异质性对那些以巨大公共开支开发的资源的发现和重用构成了巨大障碍。为了解决这个问题,开源的Bio2RDF项目推广了一种使用语义网技术整合各种生物数据的简单约定。然而,由于Bio2RDF数据集表示缺乏一致性,查询Bio2RDF仍然很困难。
我们描述了Bio2RDF的一次更新,其中包括对19个新的和更新的RDF数据集进行更紧密的整合。所有可用的开源脚本首先被整合到一个单一的GitHub存储库中,然后使用一个通用API重新开发,该API使用集中式数据集注册表生成标准化IRI。然后,我们将特定于数据集的类型和关系映射到语义科学集成本体(SIO),并展示了跨多个Bio2RDF端点的简化联合查询。
这次协调发布标志着Bio2RDF开源链接数据框架的一个重要里程碑。主要地,它提高了Bio2RDF网络中链接数据的质量,并使在本地访问或重新创建链接数据变得更加容易。我们希望通过确定优先级数据库并将词汇覆盖范围扩大到SIO之外的其他数据集词汇,继续改进Bio2RDF链接数据网络。