Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany.
Data Integration Center, Center for Medical Informatics, University Hospital Carl Gustav Carus Dresden, 01307, Dresden, Germany.
BMC Med Inform Decis Mak. 2024 Feb 26;24(1):58. doi: 10.1186/s12911-024-02458-7.
To gain insight into the real-life care of patients in the healthcare system, data from hospital information systems and insurance systems are required. Consequently, linking clinical data with claims data is necessary. To ensure their syntactic and semantic interoperability, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) from the Observational Health Data Sciences and Informatics (OHDSI) community was chosen. However, there is no detailed guide that would allow researchers to follow a generic process for data harmonization, i.e. the transformation of local source data into the standardized OMOP CDM format. Thus, the aim of this paper is to conceptualize a generic data harmonization process for OMOP CDM.
For this purpose, we conducted a literature review focusing on publications that address the harmonization of clinical or claims data in OMOP CDM. Subsequently, the process steps used and their chronological order as well as applied OHDSI tools were extracted for each included publication. The results were then compared to derive a generic sequence of the process steps.
From 23 publications included, a generic data harmonization process for OMOP CDM was conceptualized, consisting of nine process steps: dataset specification, data profiling, vocabulary identification, coverage analysis of vocabularies, semantic mapping, structural mapping, extract-transform-load-process, qualitative and quantitative data quality analysis. Furthermore, we identified seven OHDSI tools which supported five of the process steps.
The generic data harmonization process can be used as a step-by-step guide to assist other researchers in harmonizing source data in OMOP CDM.
为了深入了解医疗体系中患者的实际护理情况,需要从医院信息系统和保险系统中获取数据。因此,需要将临床数据与理赔数据进行关联。为了确保它们在语法和语义上的互操作性,选择了观察性健康数据科学和信息学(OHDSI)社区的观测医学结局伙伴关系(OMOP)通用数据模型(CDM)。然而,目前没有详细的指南可以指导研究人员遵循通用的数据协调流程,即将本地源数据转换为标准化的 OMOP CDM 格式。因此,本文的目的是概念化一个用于 OMOP CDM 的通用数据协调流程。
为此,我们进行了一项文献综述,重点关注解决 OMOP CDM 中临床或理赔数据协调问题的出版物。随后,从每篇纳入的出版物中提取并按时间顺序列出所使用的流程步骤以及应用的 OHDSI 工具。然后将结果进行比较,以推导出一个通用的流程步骤序列。
从 23 篇纳入的出版物中,我们概念化了一个用于 OMOP CDM 的通用数据协调流程,该流程包含九个流程步骤:数据集规范、数据剖析、词汇识别、词汇覆盖分析、语义映射、结构映射、提取-转换-加载-处理、定性和定量数据质量分析。此外,我们确定了七个支持五个流程步骤的 OHDSI 工具。
通用的数据协调流程可以用作逐步指导,帮助其他研究人员协调 OMOP CDM 中的源数据。