PharmacoEpidemiology and Drug Safety Research Group, Department of Pharmacy, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway.
EdenceHealth NV, Belgium.
Int J Med Inform. 2024 Nov;191:105602. doi: 10.1016/j.ijmedinf.2024.105602. Epub 2024 Aug 14.
Norwegian health registries covering entire population are used for administration, research, and emergency preparedness. We harmonized these data onto the Observational Medical Outcomes Partnership common data model (OMOP CDM) and enrich real-world data in OMOP format with COVID-19 related data.
Data from six registries (2018-2021) covering birth registrations, selected primary and secondary care events, vaccinations, and communicable disease notifications were mapped onto the OMOP CDM v5.3. An Extract-Transform-Load (ETL) pipeline was developed on simulated data using data characterization documents and scanning tools. We ran dashboard quality checks, cohort generations, investigated differences between source and mapped data, and refined the ETL accordingly.
We mapped 1.5 billion rows of data of 5,673,845 individuals. Among these, there were 804,277 pregnancies, 483,585 mothers together with 792,477 children, and 472,948 fathers. We identified 382,516 positive tests for COVID-19 in 380,794 patients. These figures are consistent with results from source data. In addition to 11 million source codes mapped automatically, we mapped 237 non-standard codes to standard concepts and introduced 38 custom concepts to accommodate pregnancy-related terminologies that were not supported by OMOP CDM vocabularies. A total of 3,700/3,705 (99.8%) checks passed. The 5 failed checks could be explained by the nature of the data and only represent a small number of records.
Norwegian registry data were successfully harmonized onto OMOP CDM with high level of concordance and provides valuable source for federated COVID-19 related research. Our mapping experience is highly valuable for data partners with Nordic health registries.
覆盖整个人口的挪威健康登记处用于行政管理、研究和应急准备。我们将这些数据协调到观察性医疗结局伙伴关系通用数据模型(OMOP CDM)中,并使用与 COVID-19 相关的数据丰富 OMOP 格式的真实世界数据。
从涵盖出生登记、选定的初级和二级保健事件、疫苗接种和传染病通知的六个登记处(2018-2021 年)中提取数据,将其映射到 OMOP CDM v5.3 上。使用数据特征描述文档和扫描工具在模拟数据上开发了提取-转换-加载(ETL)管道。我们运行了仪表板质量检查、队列生成、调查源数据和映射数据之间的差异,并相应地改进了 ETL。
我们映射了 5673845 个人的 15 亿行数据。其中,有 804277 例妊娠,483585 位母亲和 792477 名儿童,以及 472948 位父亲。我们在 380794 名患者中发现了 382516 例 COVID-19 阳性检测。这些数字与源数据的结果一致。除了自动映射的 1100 万条源代码外,我们还将 237 个非标准代码映射到标准概念,并引入了 38 个自定义概念,以适应 OMOP CDM 词汇表不支持的与妊娠相关的术语。共通过了 3700/3705(99.8%)次检查。5 次失败的检查可以用数据的性质来解释,只代表少数记录。
挪威登记处的数据成功地协调到 OMOP CDM 上,具有高度的一致性,并为联邦 COVID-19 相关研究提供了有价值的数据源。我们的映射经验对具有北欧健康登记处的数据合作伙伴具有很高的价值。