Li Ang, da Costa Wilson L, Guffey Danielle, Milner Emily M, Allam Anthony K, Kurian Karen M, Novoa Francisco J, Poche Marguerite D, Bandyo Raka, Granada Carolina, Wallace Courtney D, Zakai Neil A, Amos Christopher I
Section of Hematology-Oncology Baylor College of Medicine Houston Texas USA.
Section of Epidemiology and Population Science Baylor College of Medicine Houston Texas USA.
Res Pract Thromb Haemost. 2022 May 25;6(4):e12733. doi: 10.1002/rth2.12733. eCollection 2022 May.
Research on venous thromboembolism (VTE) that relies only on the International Classification of Diseases (ICD) can misclassify outcomes. Our study aims to discover and validate an improved VTE computable phenotype for people with cancer.
We used a cancer registry electronic health record (EHR)-linked longitudinal database. We derived three algorithms that were ICD/medication based, natural language processing (NLP) based, or all combined. We then randomly sampled 400 patients from patients with VTE codes (n = 1111) and 400 from those without VTE codes (n = 7396). Weighted sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated on the entire sample using inverse probability weighting, followed by bootstrapped receiver operating curve analysis to calculate the concordance statistic (c statistic).
Among 800 patients sampled, 280 had a confirmed acute VTE during the first year after cancer diagnosis. The ICD/medication algorithm had a weighted PPV of 95% and a weighted sensitivity of 81%, with a c statistic of 0.90 (95% confidence interval [CI], 0.89-0.91). Adding Current Procedural Terminology codes for inferior vena cava filter removal or early death did not improve the performance. The NLP algorithm had a weighted PPV of 80% and a weighted sensitivity of 90%, with a c statistic of 0.93 (95% CI, 0.92-0.94). The combined algorithm had a weighted PPV of 98% at the higher cutoff and a weighted sensitivity of 96% at the lower cutoff, with a c statistic of 0.98 (95% CI, 0.97-0.98).
Our ICD/medication-based algorithm can accurately identify VTE phenotype among patients with cancer with a high PPV of 95%. The combined algorithm should be considered in EHR databases that have access to such capabilities.
仅依靠国际疾病分类(ICD)对静脉血栓栓塞症(VTE)进行的研究可能会对结果进行错误分类。我们的研究旨在发现并验证一种针对癌症患者的改进型VTE可计算表型。
我们使用了一个与癌症登记处电子健康记录(EHR)相关联的纵向数据库。我们推导了三种算法,分别基于ICD/药物、基于自然语言处理(NLP)或两者结合。然后,我们从有VTE编码的患者(n = 1111)中随机抽取400名患者,从无VTE编码的患者(n = 7396)中随机抽取400名患者。使用逆概率加权法在整个样本上计算加权灵敏度、特异性、阳性预测值(PPV)和阴性预测值(NPV),随后进行自助式受试者工作特征曲线分析以计算一致性统计量(c统计量)。
在抽取的800名患者中,280名在癌症诊断后的第一年内确诊为急性VTE。ICD/药物算法的加权PPV为95%,加权灵敏度为81%,c统计量为0.90(95%置信区间[CI],0.89 - 0.91)。添加用于下腔静脉滤器移除或早期死亡的当前手术操作术语编码并未改善性能。NLP算法的加权PPV为80%,加权灵敏度为90%,c统计量为0.93(95% CI,0.92 - 0.94)。联合算法在较高截断值时加权PPV为98%,在较低截断值时加权灵敏度为96%,c统计量为0.98(95% CI,0.97 - 0.98)。
我们基于ICD/药物的算法能够以95%的高PPV准确识别癌症患者中的VTE表型。在具备此类功能的EHR数据库中应考虑使用联合算法。