Medical Sciences Division, University of Oxford, Oxford, UK.
Pharmaco- and Device Epidemiology, Centre for Statistics in Medicines, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.
Pharmacoepidemiol Drug Saf. 2024 Nov;33(11):e70042. doi: 10.1002/pds.70042.
The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU). We illustrate the use of this framework by generating phenotypes for two diseases: pancreatic cancer and systemic lupus erythematosus (SLE).
The phenotyping process involves a 14-steps process based on a standard operating procedure co-created by the DARWIN EU Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model.
Codelists were generated for both pancreatic cancer and SLE, and cohorts were generated in six OMOP-mapped databases. Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature, despite significant inter-database variability. Co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge.
Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the resulting cohorts. Wider use of structured and reproducible phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.
生成具有代表性的疾病表型对于确保观察性研究结果的可靠性非常重要。本文旨在概述一种基于真实世界数据的可靠且可追溯的表型生成的可重复框架,用于数据分析和真实世界交互网络(DARWIN EU)。我们通过为两种疾病(胰腺癌和系统性红斑狼疮(SLE))生成表型来演示该框架的使用。
表型生成过程涉及基于 DARWIN EU 协调中心与欧洲药品管理局合作共同创建的标准操作程序的 14 步过程。利用了许多定制的 R 包,根据映射到 OMOP 通用数据模型的真实世界数据生成和审查两种表型的代码列表。
为胰腺癌和 SLE 生成了代码列表,并在六个 OMOP 映射的数据库中生成了队列。进行了诊断检查,结果表明这些队列的发病率和患病率与先前发表的文献大致相似,尽管存在显著的数据库间差异。同时存在的症状、疾病和药物使用与基于先前知识的预先指定的临床描述相符。
我们详细的表型生成过程利用了定制工具,可以进行全面的代码列表生成和审查,以及对生成队列特征的大规模探索。在为监管目的确保观察性研究的可靠性方面,广泛使用结构化和可重复的表型方法将非常重要。