Carmen Legaz-García María Del, Miñarro-Giménez José Antonio, Menárguez-Tortosa Marcos, Fernández-Breis Jesualdo Tomás
Departamento de Informática y Sistemas, Universidad de Murcia, IMIB-Arrixaca, Murcia, 30071, Spain.
Institute of Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, 8036, Austria.
J Biomed Semantics. 2016 Jun 3;7:32. doi: 10.1186/s13326-016-0075-z.
Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats.
We present an approach for the transformation and integration of heterogeneous biomedical data with the objective of generating open biomedical datasets in Semantic Web formats. The transformation of the data is based on the mappings between the entities of the data schema and the ontological infrastructure that provides the meaning to the content. Our approach permits different types of mappings and includes the possibility of defining complex transformation patterns. Once the mappings are defined, they can be automatically applied to datasets to generate logically consistent content and the mappings can be reused in further transformation processes.
The results of our research are (1) a common transformation and integration process for heterogeneous biomedical data; (2) the application of Linked Open Data principles to generate interoperable, open, biomedical datasets; (3) a software tool, called SWIT, that implements the approach. In this paper we also describe how we have applied SWIT in different biomedical scenarios and some lessons learned.
We have presented an approach that is able to generate open biomedical repositories in Semantic Web formats. SWIT is able to apply the Linked Open Data principles in the generation of the datasets, so allowing for linking their content to external repositories and creating linked open datasets. SWIT datasets may contain data from multiple sources and schemas, thus becoming integrated datasets.
生物医学研究通常需要整合来自多个异构源的大量数据,这使得对这些数据的综合利用变得困难。语义网范式通过生成机器可读的内容,为数据集成和利用提供了一个自然的技术空间。关联开放数据是一项语义网倡议,旨在促进以机器可读的语义格式发布和共享数据。
我们提出了一种用于异构生物医学数据转换和集成的方法,目标是以语义网格式生成开放的生物医学数据集。数据转换基于数据模式实体与为内容提供意义的本体基础设施之间的映射。我们的方法允许不同类型的映射,并包括定义复杂转换模式的可能性。一旦定义了映射,就可以自动将其应用于数据集以生成逻辑一致的内容,并且这些映射可以在进一步的转换过程中重复使用。
我们的研究结果是:(1)异构生物医学数据的通用转换和集成过程;(2)应用关联开放数据原则生成可互操作、开放的生物医学数据集;(3)一个名为SWIT的软件工具,它实现了该方法。在本文中,我们还描述了我们如何在不同的生物医学场景中应用SWIT以及一些经验教训。
我们提出了一种能够以语义网格式生成开放生物医学知识库的方法。SWIT能够在数据集生成过程中应用关联开放数据原则,从而允许将其内容链接到外部知识库并创建关联开放数据集。SWIT数据集可能包含来自多个源和模式的数据,从而成为集成数据集。