HealthCore, Wilmington, DE, USA.
HealthCore, Wilmington, DE, USA.
Value Health. 2018 Sep;21(9):1098-1103. doi: 10.1016/j.jval.2018.03.008. Epub 2018 May 7.
The accuracy with which hemophilia A can be identified in claims databases is unknown.
Develop and validate an algorithm using predictive modeling supported by machine learning to identify patients with hemophilia A in an administrative claims database.
We first created a screening algorithm using medical and pharmacy claims to identify potential hemophilia A patients in the US HealthCore Integrated Research Database between January 1, 2006 and April 30, 2015. Medical records for a random sample of patients were reviewed to confirm case status. In this validation sample, we used lasso logistic regression with cross-validation to select covariates in claims data and develop a predictive model to estimate the probability of being a confirmed hemophilia A case.
The screening algorithm identified 2,252 patients and we reviewed medical records for 400 of these patients. The screening algorithm had a positive predictive value (PPV) of 65%. The predictive model identified 18 predictors of being a hemophilia A case or noncase. The strongest predictors of case status included male sex, factor VIII therapy, office visits for hemophilia A, and hospitalizations for hemophilia A. The strongest predictors of noncase status included hospitalizations for reasons other than hemophilia A and factor VIIa therapy. A probability threshold of ≥0.6 resulted in a PPV of 94.7% (95% CI: 92.0-97.5) and sensitivity of 94.4% (95% CI: 91.5-97.2).
We developed and validated an algorithm to identify hemophilia A cases in an administrative claims database with high sensitivity and high PPV.
在理赔数据库中识别血友病 A 的准确性尚不清楚。
开发并验证一种使用机器学习支持的预测建模算法,以在行政理赔数据库中识别血友病 A 患者。
我们首先使用医疗和药房理赔数据创建了一个筛选算法,以在 2006 年 1 月 1 日至 2015 年 4 月 30 日期间在美国 HealthCore 综合研究数据库中识别潜在的血友病 A 患者。对随机抽样患者的医疗记录进行了回顾,以确认病例状态。在验证样本中,我们使用带有交叉验证的套索逻辑回归选择理赔数据中的协变量,并开发了一个预测模型来估计成为确诊血友病 A 病例的概率。
筛选算法确定了 2252 名患者,我们对其中 400 名患者的医疗记录进行了回顾。筛选算法的阳性预测值(PPV)为 65%。预测模型确定了 18 个预测血友病 A 病例或非病例的因素。病例状态的最强预测因素包括男性、VIII 因子治疗、血友病 A 的门诊就诊和血友病 A 的住院治疗。非病例状态的最强预测因素包括非血友病 A 原因的住院治疗和因子 VIIa 治疗。概率阈值≥0.6 导致 PPV 为 94.7%(95%CI:92.0-97.5)和敏感性为 94.4%(95%CI:91.5-97.2)。
我们开发并验证了一种在理赔数据库中识别血友病 A 病例的算法,具有高敏感性和高 PPV。