使用数据分析和真实世界探究网络（DARWIN EU）中的分布式分析和工具进行标准化和可重现的表型分析。

Standardised and Reproducible Phenotyping Using Distributed Analytics and Tools in the Data Analysis and Real World Interrogation Network (DARWIN EU).

机构信息

Medical Sciences Division, University of Oxford, Oxford, UK.

Pharmaco- and Device Epidemiology, Centre for Statistics in Medicines, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK.

出版信息

Pharmacoepidemiol Drug Saf. 2024 Nov;33(11):e70042. doi: 10.1002/pds.70042.

DOI:10.1002/pds.70042

PMID:39532529

Abstract

PURPOSE

The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU). We illustrate the use of this framework by generating phenotypes for two diseases: pancreatic cancer and systemic lupus erythematosus (SLE).

METHODS

The phenotyping process involves a 14-steps process based on a standard operating procedure co-created by the DARWIN EU Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model.

RESULTS

Codelists were generated for both pancreatic cancer and SLE, and cohorts were generated in six OMOP-mapped databases. Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature, despite significant inter-database variability. Co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge.

CONCLUSIONS

Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the resulting cohorts. Wider use of structured and reproducible phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.

摘要

目的

生成具有代表性的疾病表型对于确保观察性研究结果的可靠性非常重要。本文旨在概述一种基于真实世界数据的可靠且可追溯的表型生成的可重复框架，用于数据分析和真实世界交互网络（DARWIN EU）。我们通过为两种疾病（胰腺癌和系统性红斑狼疮（SLE））生成表型来演示该框架的使用。

方法

表型生成过程涉及基于 DARWIN EU 协调中心与欧洲药品管理局合作共同创建的标准操作程序的 14 步过程。利用了许多定制的 R 包，根据映射到 OMOP 通用数据模型的真实世界数据生成和审查两种表型的代码列表。

结果

为胰腺癌和 SLE 生成了代码列表，并在六个 OMOP 映射的数据库中生成了队列。进行了诊断检查，结果表明这些队列的发病率和患病率与先前发表的文献大致相似，尽管存在显著的数据库间差异。同时存在的症状、疾病和药物使用与基于先前知识的预先指定的临床描述相符。

结论

我们详细的表型生成过程利用了定制工具，可以进行全面的代码列表生成和审查，以及对生成队列特征的大规模探索。在为监管目的确保观察性研究的可靠性方面，广泛使用结构化和可重复的表型方法将非常重要。

相似文献

Standardised and Reproducible Phenotyping Using Distributed Analytics and Tools in the Data Analysis and Real World Interrogation Network (DARWIN EU).使用数据分析和真实世界探究网络（DARWIN EU）中的分布式分析和工具进行标准化和可重现的表型分析。

Pharmacoepidemiol Drug Saf. 2024 Nov;33(11):e70042. doi: 10.1002/pds.70042.

Prevalence and incidence of systemic lupus erythematosus in the adult population of Estonia.爱沙尼亚成年人群中系统性红斑狼疮的患病率和发病率。

Lupus. 2017 Sep;26(10):1115-1120. doi: 10.1177/0961203316686705. Epub 2017 Jan 6.

Increased cancer risk in patients with cutaneous lupus erythematosus and systemic lupus erythematosus compared with the general population: A Danish nationwide cohort study.与普通人群相比，皮肤型红斑狼疮和系统性红斑狼疮患者的癌症风险增加：一项丹麦全国性队列研究。

Lupus. 2021 Apr;30(5):752-761. doi: 10.1177/0961203321990106. Epub 2021 Jan 26.

Translating and evaluating historic phenotyping algorithms using SNOMED CT.使用 SNOMED CT 对历史表型算法进行翻译和评估。

J Am Med Inform Assoc. 2023 Jan 18;30(2):222-232. doi: 10.1093/jamia/ocac158.

Facilitating phenotype transfer using a common data model.利用通用数据模型促进表型转移。

J Biomed Inform. 2019 Aug;96:103253. doi: 10.1016/j.jbi.2019.103253. Epub 2019 Jul 17.

A systematic review of validated methods for identifying systemic lupus erythematosus (SLE) using administrative or claims data.基于行政或索赔数据识别系统性红斑狼疮（SLE）的已验证方法的系统评价。

Vaccine. 2013 Dec 30;31 Suppl 10:K62-73. doi: 10.1016/j.vaccine.2013.06.104.

Erratum: High-Throughput Identification of Resistance to Pseudomonas syringae pv. Tomato in Tomato using Seedling Flood Assay.勘误：利用幼苗浸没法高通量鉴定番茄对丁香假单胞菌 pv.番茄的抗性。

J Vis Exp. 2023 Oct 18(200). doi: 10.3791/6576.

Checklist and guidance on creating codelists for routinely collected health data research.常规收集的健康数据研究编码列表创建清单及指南

NIHR Open Res. 2024 Sep 18;4:20. doi: 10.3310/nihropenres.13550.2. eCollection 2024.

Using a data-driven approach for the development and evaluation of phenotype algorithms for systemic lupus erythematosus.运用数据驱动方法开发和评估系统性红斑狼疮表型算法。

PLoS One. 2023 Feb 16;18(2):e0281929. doi: 10.1371/journal.pone.0281929. eCollection 2023.

IncidencePrevalence: An R package to calculate population-level incidence rates and prevalence using the OMOP common data model.发病率和患病率：一个使用 OMOP 通用数据模型计算人群发病率和患病率的 R 包。

Pharmacoepidemiol Drug Saf. 2024 Jan;33(1):e5717. doi: 10.1002/pds.5717. Epub 2023 Oct 25.

引用本文的文献

Core Concepts in Pharmacoepidemiology: Multi-Database Distributed Data Networks.药物流行病学的核心概念：多数据库分布式数据网络

Pharmacoepidemiol Drug Saf. 2025 Jul;34(7):e70177. doi: 10.1002/pds.70177.

How to Design Electronic Case Report Form (eCRF) Questions to Maximize Semantic Interoperability in Clinical Research.如何设计电子病例报告表（eCRF）问题以最大化临床研究中的语义互操作性。

Interact J Med Res. 2025 Mar 3;14:e51598. doi: 10.2196/51598.

CohortDiagnostics: Phenotype evaluation across a network of observational data sources using population-level characterization.队列诊断：使用人群水平特征在观察性数据源网络中进行表型评估。

PLoS One. 2025 Jan 16;20(1):e0310634. doi: 10.1371/journal.pone.0310634. eCollection 2025.

Objective study validity diagnostics: a framework requiring pre-specified, empirical verification to increase trust in the reliability of real-world evidence.客观研究有效性诊断：一个需要预先指定的实证验证的框架，以增强对真实世界证据可靠性的信任。

J Am Med Inform Assoc. 2025 Mar 1;32(3):518-525. doi: 10.1093/jamia/ocae317.

An operationalization framework for lifecycle health technology assessment: a Health Technology Assessment International Global Policy Forum Task Force report.生命周期健康技术评估的实施框架：健康技术评估国际全球政策论坛工作组报告。

Int J Technol Assess Health Care. 2024 May 16;40(1):e45. doi: 10.1017/S0266462324000199.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用数据分析和真实世界探究网络（DARWIN EU）中的分布式分析和工具进行标准化和可重现的表型分析。

Standardised and Reproducible Phenotyping Using Distributed Analytics and Tools in the Data Analysis and Real World Interrogation Network (DARWIN EU).

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献