Weissler Elizabeth Hope, Lippmann Steven J, Smerek Michelle M, Ward Rachael A, Kansal Aman, Brock Adam, Sullivan Robert C, Long Chandler, Patel Manesh R, Greiner Melissa A, Hardy N Chantelle, Curtis Lesley H, Jones W Schuyler
Division of Vascular and Endovascular Surgery, Duke University School of Medicine, Durham, NC, United States.
Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States.
JMIR Med Inform. 2020 Aug 19;8(8):e18542. doi: 10.2196/18542.
Peripheral artery disease (PAD) affects 8 to 10 million Americans, who face significantly elevated risks of both mortality and major limb events such as amputation. Unfortunately, PAD is relatively underdiagnosed, undertreated, and underresearched, leading to wide variations in treatment patterns and outcomes. Efforts to improve PAD care and outcomes have been hampered by persistent difficulties identifying patients with PAD for clinical and investigatory purposes.
The aim of this study is to develop and validate a model-based algorithm to detect patients with peripheral artery disease (PAD) using data from an electronic health record (EHR) system.
An initial query of the EHR in a large health system identified all patients with PAD-related diagnosis codes for any encounter during the study period. Clinical adjudication of PAD diagnosis was performed by chart review on a random subgroup. A binary logistic regression to predict PAD was built and validated using a least absolute shrinkage and selection operator (LASSO) approach in the adjudicated patients. The algorithm was then applied to the nonsampled records to further evaluate its performance.
The initial EHR data query using 406 diagnostic codes yielded 15,406 patients. Overall, 2500 patients were randomly selected for ground truth PAD status adjudication. In the end, 108 code flags remained after removing rarely- and never-used codes. We entered these code flags plus administrative encounter, imaging, procedure, and specialist flags into a LASSO model. The area under the curve for this model was 0.862.
The algorithm we constructed has two main advantages over other approaches to the identification of patients with PAD. First, it was derived from a broad population of patients with many different PAD manifestations and treatment pathways across a large health system. Second, our model does not rely on clinical notes and can be applied in situations in which only administrative billing data (eg, large administrative data sets) are available. A combination of diagnosis codes and administrative flags can accurately identify patients with PAD in large cohorts.
外周动脉疾病(PAD)影响着800万至1000万美国人,他们面临着显著升高的死亡风险以及诸如截肢等严重肢体事件的风险。不幸的是,PAD相对而言诊断不足、治疗不足且研究不足,导致治疗模式和结果存在很大差异。由于在临床和研究中持续难以识别PAD患者,改善PAD护理和结果的努力受到了阻碍。
本研究的目的是开发并验证一种基于模型的算法,利用电子健康记录(EHR)系统的数据来检测外周动脉疾病(PAD)患者。
在一个大型医疗系统中对EHR进行初步查询,确定了研究期间任何一次就诊时所有具有PAD相关诊断代码的患者。通过对随机抽取的亚组进行病历审查,对PAD诊断进行临床判定。在经过判定的患者中,使用最小绝对收缩和选择算子(LASSO)方法构建并验证了一个预测PAD的二元逻辑回归模型。然后将该算法应用于未抽样的记录,以进一步评估其性能。
使用406个诊断代码对EHR数据进行初步查询,得到了15406名患者。总体而言,随机选择了2500名患者进行PAD真实状态判定。最后,在去除极少使用和从未使用的代码后,保留了108个代码标记。我们将这些代码标记以及行政就诊、影像、手术和专科标记输入到一个LASSO模型中。该模型的曲线下面积为0.862。
我们构建的算法相对于其他识别PAD患者的方法有两个主要优点。首先,它源自一个大型医疗系统中具有许多不同PAD表现和治疗途径的广泛患者群体。其次,我们的模型不依赖临床记录,可应用于仅能获取行政计费数据(如大型行政数据集)的情况。诊断代码和行政标记的组合能够准确识别大型队列中的PAD患者。