Cheung Kei-Hoi, Yip Kevin Y, Smith Andrew, Deknikker Remko, Masiar Andy, Gerstein Mark
Center for Medical Informatics, Yale University New Haven, CT 06520, USA.
Bioinformatics. 2005 Jun;21 Suppl 1:i85-96. doi: 10.1093/bioinformatics/bti1026.
As the semantic web technology is maturing and the need for life sciences data integration over the web is growing, it is important to explore how data integration needs can be addressed by the semantic web. The main problem that we face in data integration is a lack of widely-accepted standards for expressing the syntax and semantics of the data. We address this problem by exploring the use of semantic web technologies-including resource description framework (RDF), RDF site summary (RSS), relational-database-to-RDF mapping (D2RQ) and native RDF data repository-to represent, store and query both metadata and data across life sciences datasets.
As many biological datasets are presently available in tabular format, we introduce an RDF structure into which they can be converted. Also, we develop a prototype web-based application called YeastHub that demonstrates how a life sciences data warehouse can be built using a native RDF data store (Sesame). This data warehouse allows integration of different types of yeast genome data provided by different resources in different formats including the tabular and RDF formats. Once the data are loaded into the data warehouse, RDF-based queries can be formulated to retrieve and query the data in an integrated fashion.
The YeastHub website is accessible via the following URL: http://yeasthub.gersteinlab.org.
随着语义网技术的成熟以及通过网络进行生命科学数据集成的需求不断增长,探索语义网如何满足数据集成需求变得至关重要。我们在数据集成中面临的主要问题是缺乏用于表达数据语法和语义的广泛接受的标准。我们通过探索使用语义网技术(包括资源描述框架(RDF)、RDF站点摘要(RSS)、关系数据库到RDF映射(D2RQ)和原生RDF数据存储库)来表示、存储和查询生命科学数据集中的元数据和数据,从而解决这个问题。
由于目前许多生物数据集都是表格形式,我们引入了一种RDF结构,可将它们转换到该结构中。此外,我们开发了一个名为YeastHub的基于网络的原型应用程序,展示了如何使用原生RDF数据存储(Sesame)构建生命科学数据仓库。这个数据仓库允许整合由不同资源以不同格式(包括表格和RDF格式)提供的不同类型的酵母基因组数据。一旦数据加载到数据仓库中,就可以制定基于RDF的查询,以集成的方式检索和查询数据。
可通过以下网址访问YeastHub网站:http://yeasthub.gersteinlab.org。