Health & Biomedical Research Information Technology Unit (HaBIC R2), Department of General Practice and Primary Care, Faculty of Medicine, Dentistry & Health Sciences, The University of Melbourne, Parkville, Victoria, Australia.
PLoS One. 2024 Apr 18;19(4):e0301557. doi: 10.1371/journal.pone.0301557. eCollection 2024.
The use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and 'validation' analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.
We used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.
Across three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A 'FAIL' occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.
The OMOP CDM's widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
将常规收集的健康数据用于二次研究目的,这种方法已逐渐被认可,因为它可以促进医学研究、改善患者预后并指导政策制定。在电子病历 (EMR) 中发现的这种二次数据,可以通过转换为统一的数据结构,与其他可比健康指标数据集一起进行分析,从而得到优化。这可以通过观察医学结局伙伴关系通用数据模型(OMOP-CDM)实现,该模型采用标准化词汇,以便在各种观察数据库中进行系统分析。OMOP-CDM 的概念是通过在唯一存储库中协调术语、词汇和编码方案,将数据转换为通用格式。该模型通过开发共享分析和预测技术、药物警戒以主动监测药物安全性以及在澳大利亚、美国、欧洲和亚太地区的多个机构进行“验证”分析,增强了研究能力。在这项研究中,我们旨在调查在 PATRON 初级保健数据存储库中使用开源 OMOP-CDM 的情况。
我们使用标准的结构化查询语言 (SQL) 构建、提取、转换和加载脚本,将数据转换为 OMOP-CDM。从各种 EMR 中提取的独特自由文本术语的映射过程提出了重大挑战,因为许多术语无法通过直接文本比较自动匹配到标准词汇。这导致许多术语需要手动分配。为了解决这个问题,我们实施了一项策略,指示我们的临床映射器仅关注出现频率足够高的术语。我们为每个域设定了一个特定的阈值,以确保在完成适当的映射后,超过 95%的所有记录都可以链接到 SNOMED 等批准的词汇。为了评估生成的 OMOP 数据集的数据质量,我们使用 OHDSI 数据质量仪表板 (DQD) 根据 Kahn 框架评估 PATRON 存储库中数据的合理性、一致性和全面性。
在三个初级保健 EMR 系统中,我们将 203 万活跃患者的数据转换为 OMOP 通用数据模型的版本 5.4。DQD 评估共涉及 3570 项单项评估。每项评估都将结果与预定义阈值进行比较。当不符合规定的行数百分比超过规定阈值时,就会出现“FAIL”。在对这里描述的初级保健 OMOP 数据库的评估中,我们实现了 97%的总体通过率。
OMOP CDM 在国际上的广泛使用、支持和培训为协作研究中的数据标准化提供了一个成熟的途径。其兼容性允许本地和国际研究小组共享分析包,从而促进快速和可重复的数据比较。一套开源工具,包括 OHDSI 数据质量仪表板(版本 1.4.1),支持该模型。其简单性和基于标准的方法促进了采用和集成到现有数据流程中。