Post Lennart J G, Roos Marco, Marshall M Scott, van Driel Roel, Breit Timo M
Integrative Bioinformatics Unit and Nuclear Organization Group, Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 SM, Amsterdam, The Netherlands.
Bioinformatics. 2007 Nov 15;23(22):3080-7. doi: 10.1093/bioinformatics/btm461. Epub 2007 Sep 19.
The numerous public data resources make integrative bioinformatics experimentation increasingly important in life sciences research. However, it is severely hampered by the way the data and information are made available. The semantic web approach enhances data exchange and integration by providing standardized formats such as RDF, RDF Schema (RDFS) and OWL, to achieve a formalized computational environment. Our semantic web-enabled data integration (SWEDI) approach aims to formalize biological domains by capturing the knowledge in semantic models using ontologies as controlled vocabularies. The strategy is to build a collection of relatively small but specific knowledge and data models, which together form a 'personal semantic framework'. This can be linked to external large, general knowledge and data models. In this way, the involved scientists are familiar with the concepts and associated relationships in their models and can create semantic queries using their own terms. We studied the applicability of our SWEDI approach in the context of a biological use case by integrating genomics data sets for histone modification and transcription factor binding sites.
We constructed four OWL knowledge models, two RDFS data models, transformed and mapped relevant data to the data models, linked the data models to knowledge models using linkage statements, and ran semantic queries. Our biological use case demonstrates the relevance of these kinds of integrative bioinformatics experiments. Our findings show high startup costs for the SWEDI approach, but straightforward extension with similar data.
众多的公共数据资源使得整合生物信息学实验在生命科学研究中变得越来越重要。然而,数据和信息的提供方式严重阻碍了这一进程。语义网方法通过提供诸如RDF、RDF模式(RDFS)和OWL等标准化格式来增强数据交换和整合,以实现形式化的计算环境。我们的语义网支持的数据集成(SWEDI)方法旨在通过使用本体作为受控词汇表在语义模型中捕获知识,从而将生物领域形式化。该策略是构建一组相对较小但特定的知识和数据模型,这些模型共同构成一个“个人语义框架”。这可以与外部的大型通用知识和数据模型相链接。通过这种方式,相关科学家熟悉其模型中的概念和相关关系,并能够使用自己的术语创建语义查询。我们通过整合用于组蛋白修饰和转录因子结合位点的基因组数据集,研究了我们的SWEDI方法在生物用例中的适用性。
我们构建了四个OWL知识模型、两个RDFS数据模型,将相关数据转换并映射到数据模型,使用链接语句将数据模型与知识模型相链接,并运行语义查询。我们的生物用例证明了这类整合生物信息学实验的相关性。我们的研究结果表明,SWEDI方法的启动成本很高,但使用类似数据进行扩展却很简单。