Hyde Bethany, Paoli Carly J, Panjabi Sumeet, Bettencourt Katherine C, Bell Lynum Karimah S, Selej Mona
Janssen Business Technology Commercial Data Insights & Data Science Titusville New Jersey USA.
Janssen Scientific Affairs, Inc. Titusville New Jersey USA.
Pulm Circ. 2023 Jun 6;13(2):e12237. doi: 10.1002/pul2.12237. eCollection 2023 Apr.
Many patients with pulmonary arterial hypertension (PAH) experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs. Tools for diagnosing PAH sooner may lead to earlier treatment, which may delay disease progression and adverse outcomes including hospitalization and death. We developed a machine-learning (ML) algorithm to identify patients at risk for PAH earlier in their symptom journey and distinguish them from patients with similar early symptoms not at risk for developing PAH. Our supervised ML model analyzed retrospective, de-identified data from the US-based Optum® Clinformatics® Data Mart claims database (January 2015 to December 2019). Propensity score matched PAH and non-PAH (control) cohorts were established based on observed differences. Random forest models were used to classify patients as PAH or non-PAH at diagnosis and at 6 months prediagnosis. The PAH and non-PAH cohorts included 1339 and 4222 patients, respectively. At 6 months prediagnosis, the model performed well in distinguishing PAH and non-PAH patients, with area under the curve of the receiver operating characteristic of 0.84, recall (sensitivity) of 0.73, and precision of 0.50. Key features distinguishing PAH from non-PAH cohorts were a longer time between first symptom and the prediagnosis model date (i.e., 6 months before diagnosis); more diagnostic and prescription claims, circulatory claims, and imaging procedures, leading to higher overall healthcare resource utilization; and more hospitalizations. Our model distinguishes between patients with and without PAH at 6 months before diagnosis and illustrates the feasibility of using routine claims data to identify patients at a population level who might benefit from PAH-specific screening and/or earlier specialist referral.
许多肺动脉高压(PAH)患者在诊断方面存在显著延迟,这与更差的预后和更高的成本相关。更早诊断PAH的工具可能会带来更早的治疗,从而可能延缓疾病进展以及包括住院和死亡在内的不良后果。我们开发了一种机器学习(ML)算法,以在症状出现过程中更早地识别有PAH风险的患者,并将他们与有类似早期症状但无PAH发病风险的患者区分开来。我们的监督式ML模型分析了来自美国Optum® Clinformatics®数据集市索赔数据库(2015年1月至2019年12月)的回顾性、去识别化数据。基于观察到的差异建立了倾向评分匹配的PAH和非PAH(对照)队列。随机森林模型用于在诊断时和诊断前6个月将患者分类为PAH或非PAH。PAH和非PAH队列分别包括1339例和4222例患者。在诊断前6个月,该模型在区分PAH和非PAH患者方面表现良好,受试者操作特征曲线下面积为0.84,召回率(敏感性)为0.73,精确率为0.50。区分PAH和非PAH队列的关键特征是从首次症状出现到诊断前模型日期(即诊断前6个月)的时间更长;更多的诊断和处方索赔、循环系统索赔以及影像检查程序,导致更高的总体医疗资源利用率;以及更多的住院治疗。我们的模型在诊断前6个月就能区分有无PAH的患者,并说明了使用常规索赔数据在人群层面识别可能从PAH特异性筛查和/或更早的专科转诊中受益的患者的可行性。