Martin Benjamin, Kelly Will, Morgan-Cooper Hannah, Falconer Thomas, Park Elizabeth, Desai Priya, Fiorentino David, Chung Lorinda, Yen Sean, Wang Zachary, Saygin Didem, George Michael, Rao Gowtham A, Swerdel Joel, Shoaibi Azza, Mecoli Christopher A
Johns Hopkins School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.
Stanford University School of Medicine and Stanford Health Care, Redwood City, CA.
Arthritis Care Res (Hoboken). 2025 Aug 12. doi: 10.1002/acr.25625.
Studying rare diseases like dermatomyositis (DM) in single-center cohorts is challenging due to small sample sizes and limited generalizability. This study develops and evaluates case identification algorithms for DM to enable coordinated analysis across multiple data sources.
Case identification algorithms were developed to identify adult DM patients within eleven independent electronic health record or claims databases, totaling over 800 million patients, using the Observational Medical Outcomes Partnership (OMOP) Common Data Model. Algorithm performance was assessed through manual chart review and using OHDSI open-source tools (CohortDiagnostics, PheValuator), which quantify incidence rates and performance metrics such as sensitivity and positive predictive value (PPV).
Eight DM case identification algorithms were evaluated across eleven databases, revealing significant variability in performance, with sensitivity and PPV differing by more than 30% between some databases. Overall, we identified one incidence algorithm and one prevalence algorithm with good performance, demonstrated by sensitivity rates of 42% and 49% and PPV values of 83% and 84%, respectively. PheValuator quantified algorithm performance within each database, allowing for direct comparison of different criteria. Additionally, CohortDiagnostics generated incidence rates stratified by age decile and sex, aligning with previous epidemiological data.
We developed and validated multiple DM case identification algorithms across diverse databases, demonstrating their accuracy through multiple evaluation methods. This approach enables more generalizable, reproducible research using real-world data and can be applied to other rheumatic diseases.
在单中心队列中研究皮肌炎(DM)等罕见疾病具有挑战性,原因是样本量小且可推广性有限。本研究开发并评估了用于DM的病例识别算法,以实现跨多个数据源的协同分析。
开发病例识别算法,使用观察性医疗结果合作组织(OMOP)通用数据模型,在11个独立的电子健康记录或理赔数据库(总计超过8亿患者)中识别成年DM患者。通过人工病历审查以及使用OHDSI开源工具(队列诊断、PheValuator)评估算法性能,这些工具可量化发病率以及敏感性和阳性预测值(PPV)等性能指标。
在11个数据库中评估了8种DM病例识别算法,结果显示性能存在显著差异,某些数据库之间的敏感性和PPV差异超过30%。总体而言,我们确定了一种发病率算法和一种患病率算法,性能良好,敏感性率分别为42%和49%,PPV值分别为83%和84%。PheValuator量化了每个数据库中的算法性能,便于直接比较不同标准。此外,队列诊断生成了按年龄十分位数和性别分层的发病率,与先前的流行病学数据一致。
我们在不同数据库中开发并验证了多种DM病例识别算法,通过多种评估方法证明了其准确性。这种方法能够利用真实世界数据进行更具可推广性、可重复性的研究,并可应用于其他风湿性疾病。