Institute of Health Informatics, University College London, London, UK.
Health Data Research UK, London, UK.
J Am Med Inform Assoc. 2022 Dec 13;30(1):103-111. doi: 10.1093/jamia/ocac203.
The coronavirus disease 2019 (COVID-19) pandemic has demonstrated the value of real-world data for public health research. International federated analyses are crucial for informing policy makers. Common data models (CDMs) are critical for enabling these studies to be performed efficiently. Our objective was to convert the UK Biobank, a study of 500 000 participants with rich genetic and phenotypic data to the Observational Medical Outcomes Partnership (OMOP) CDM.
We converted UK Biobank data to OMOP CDM v. 5.3. We transformedparticipant research data on diseases collected at recruitment and electronic health records (EHRs) from primary care, hospitalizations, cancer registrations, and mortality from providers in England, Scotland, and Wales. We performed syntactic and semantic validations and compared comorbidities and risk factors between source and transformed data.
We identified 502 505 participants (3086 with COVID-19) and transformed 690 fields (1 373 239 555 rows) to the OMOP CDM using 8 different controlled clinical terminologies and bespoke mappings. Specifically, we transformed self-reported noncancer illnesses 946 053 (83.91% of all source entries), cancers 37 802 (70.81%), medications 1 218 935 (88.25%), and prescriptions 864 788 (86.96%). In EHR, we transformed 13 028 182 (99.95%) hospital diagnoses, 6 465 399 (89.2%) procedures, 337 896 333 primary care diagnoses (CTV3, SNOMED-CT), 139 966 587 (98.74%) prescriptions (dm+d) and 77 127 (99.95%) deaths (ICD-10). We observed good concordance across demographic, risk factor, and comorbidity factors between source and transformed data.
Our study demonstrated that the OMOP CDM can be successfully leveraged to harmonize complex large-scale biobanked studies combining rich multimodal phenotypic data. Our study uncovered several challenges when transforming data from questionnaires to the OMOP CDM which require further research. The transformed UK Biobank resource is a valuable tool that can enable federated research, like COVID-19 studies.
2019 年冠状病毒病(COVID-19)大流行证明了真实世界数据在公共卫生研究中的价值。国际联合分析对于为决策者提供信息至关重要。通用数据模型(CDM)对于实现这些研究的高效进行至关重要。我们的目标是将英国生物银行(一个拥有丰富遗传和表型数据的 50 万名参与者的研究)转换为观察医疗结果伙伴关系(OMOP)CDM。
我们将英国生物银行的数据转换为 OMOP CDM v. 5.3。我们转换了参与者在招募时收集的疾病研究数据,以及来自英格兰、苏格兰和威尔士的初级保健、住院、癌症登记和死亡的电子健康记录(EHR)。我们进行了语法和语义验证,并比较了源数据和转换后数据之间的合并症和危险因素。
我们确定了 502505 名参与者(3086 名 COVID-19 患者),并使用 8 种不同的受控临床术语和定制映射将 690 个字段(1373239555 行)转换为 OMOP CDM。具体来说,我们转换了自我报告的非癌症疾病 946053 例(所有源记录的 83.91%),癌症 37802 例(70.81%),药物 1218935 例(88.25%)和处方 864788 例(86.96%)。在 EHR 中,我们转换了 13028182 例(99.95%)住院诊断,6465399 例(89.2%)手术,33789633 例(CTV3、SNOMED-CT)初级保健诊断,139966587 例(89.74%)处方(dm+d)和 77127 例(99.95%)死亡(ICD-10)。我们观察到源数据和转换后数据在人口统计学、危险因素和合并症因素之间具有良好的一致性。
我们的研究表明,OMOP CDM 可以成功地用于协调结合丰富多模态表型数据的复杂大规模生物银行研究。我们的研究发现,将数据从问卷转换为 OMOP CDM 时存在一些挑战,需要进一步研究。转换后的英国生物银行资源是一个有价值的工具,可以实现联邦研究,如 COVID-19 研究。