Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, 30100, Murcia, Spain.
Comput Methods Programs Biomed. 2024 Jan;243:107918. doi: 10.1016/j.cmpb.2023.107918. Epub 2023 Nov 10.
The adoption of new technologies in clinical care systems has propitiated the availability of a great amount of valuable data. However, this data is usually heterogeneous, requiring its harmonization to be integrated and analysed. We propose a semantic-driven harmonization framework that (1) enables the meaningful sharing and integration of healthcare data across institutions and (2) facilitates the analysis and exploitation of the shared data.
The framework includes an ontology-based common data model (i.e. SCDM), a data transformation pipeline and a semantic query system. Heterogeneous datasets, mapped to different terminologies, are integrated by using an ontology-based infrastructure rooted in a top-level ontology. A graph database is generated by using these mappings, and web-based semantic query system facilitates data exploration.
Several datasets from different European institutions have been integrated by using the framework in the context of the European H2020 Precise4Q project. Through the query system, data scientists were able to explore data and use it for building machine learning models.
The flexible data representation using RDF, together with the formal semantic underpinning provided by the SCDM, have enabled the semantic integration, query and advanced exploitation of heterogeneous data in the context of the Precise4Q project.
临床护理系统中新技术的采用促进了大量有价值数据的产生。然而,这些数据通常是异构的,需要对其进行协调以实现集成和分析。我们提出了一个语义驱动的协调框架,(1)能够在机构间实现有意义的数据共享和集成,(2)便于对共享数据进行分析和利用。
该框架包括基于本体的公共数据模型(即 SCDM)、数据转换管道和语义查询系统。通过基于本体的基础设施,将映射到不同术语的异构数据集进行集成,该基础设施的基础是一个顶级本体。通过这些映射生成图形数据库,然后使用基于 Web 的语义查询系统来方便数据探索。
在欧洲 H2020 Precise4Q 项目的背景下,使用该框架集成了来自不同欧洲机构的多个数据集。通过查询系统,数据科学家能够探索数据并将其用于构建机器学习模型。
使用 RDF 进行灵活的数据表示,以及 SCDM 提供的正式语义基础,使在 Precise4Q 项目的背景下实现了异构数据的语义集成、查询和高级利用。