Rao Gowtham A, Shoaibi Azza, Makadia Rupa, Hardin Jill, Swerdel Joel, Weaver James, Voss Erica A, Conover Mitchell M, Fortin Stephen, Sena Anthony G, Knoll Chris, Hughes Nigel, Gilbert James P, Blacketer Clair, Andryc Alan, DeFalco Frank, Molinaro Anthony, Reps Jenna, Schuemie Martijn J, Ryan Patrick B
Observational Health Data Analytics, Janssen Research and Development, LLC, Titusville, NJ, United States of America.
OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States of America.
PLoS One. 2025 Jan 16;20(1):e0310634. doi: 10.1371/journal.pone.0310634. eCollection 2025.
This paper introduces a novel framework for evaluating phenotype algorithms (PAs) using the open-source tool, Cohort Diagnostics.
The method is based on several diagnostic criteria to evaluate a patient cohort returned by a PA. Diagnostics include estimates of incidence rate, index date entry code breakdown, and prevalence of all observed clinical events prior to, on, and after index date. We test our framework by evaluating one PA for systemic lupus erythematosus (SLE) and two PAs for Alzheimer's disease (AD) across 10 different observational data sources.
By utilizing CohortDiagnostics, we found that the population-level characteristics of individuals in the cohort of SLE closely matched the disease's anticipated clinical profile. Specifically, the incidence rate of SLE was consistently higher in occurrence among females. Moreover, expected clinical events like laboratory tests, treatments, and repeated diagnoses were also observed. For AD, although one PA identified considerably fewer patients, absence of notable differences in clinical characteristics between the two cohorts suggested similar specificity.
We provide a practical and data-driven approach to evaluate PAs, using two clinical diseases as examples, across a network of OMOP data sources. Cohort Diagnostics can ensure the subjects identified by a specific PA align with those intended for inclusion in a research study.
Diagnostics based on large-scale population-level characterization can offer insights into the misclassification errors of PAs.
本文介绍一种使用开源工具“队列诊断”(Cohort Diagnostics)评估表型算法(PAs)的新框架。
该方法基于多个诊断标准来评估PA返回的患者队列。诊断内容包括发病率估计、索引日期录入代码细分,以及索引日期之前、当日和之后所有观察到的临床事件的患病率。我们通过在10个不同的观察性数据源中评估一种系统性红斑狼疮(SLE)的PA和两种阿尔茨海默病(AD)的PA来测试我们的框架。
通过使用队列诊断,我们发现SLE队列中个体的人群水平特征与该疾病预期的临床概况密切匹配。具体而言,SLE的发病率在女性中始终较高。此外,还观察到了如实验室检查、治疗和重复诊断等预期的临床事件。对于AD,尽管一种PA识别出的患者数量少得多,但两个队列在临床特征上没有显著差异,表明特异性相似。
我们以两种临床疾病为例,在OMOP数据源网络中提供了一种实用的、数据驱动的方法来评估PA。队列诊断可以确保特定PA识别的受试者与研究中打算纳入的受试者一致。
基于大规模人群水平特征的诊断可以深入了解PA的错误分类误差。