Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.
Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA.
Pharmacoepidemiol Drug Saf. 2024 Jan;33(1):e5684. doi: 10.1002/pds.5684. Epub 2023 Aug 31.
We aimed to determine whether integrating concepts from the notes from the electronic health record (EHR) data using natural language processing (NLP) could improve the identification of gout flares.
Using Medicare claims linked with EHR, we selected gout patients who initiated the urate-lowering therapy (ULT). Patients' 12-month baseline period and on-treatment follow-up were segmented into 1-month units. We retrieved EHR notes for months with gout diagnosis codes and processed notes for NLP concepts. We selected a random sample of 500 patients and reviewed each of their notes for the presence of a physician-documented gout flare. Months containing at least 1 note mentioning gout flares were considered months with events. We used 60% of patients to train predictive models with LASSO. We evaluated the models by the area under the curve (AUC) in the validation data and examined positive/negative predictive values (P/NPV).
We extracted and labeled 839 months of follow-up (280 with gout flares). The claims-only model selected 20 variables (AUC = 0.69). The NLP concept-only model selected 15 (AUC = 0.69). The combined model selected 32 claims variables and 13 NLP concepts (AUC = 0.73). The claims-only model had a PPV of 0.64 [0.50, 0.77] and an NPV of 0.71 [0.65, 0.76], whereas the combined model had a PPV of 0.76 [0.61, 0.88] and an NPV of 0.71 [0.65, 0.76].
Adding NLP concept variables to claims variables resulted in a small improvement in the identification of gout flares. Our data-driven claims-only model and our combined claims/NLP-concept model outperformed existing rule-based claims algorithms reliant on medication use, diagnosis, and procedure codes.
我们旨在确定使用自然语言处理(NLP)整合电子健康记录(EHR)数据中的笔记概念是否可以提高痛风发作的识别率。
我们使用与 EHR 相关联的医疗保险索赔数据,选择开始降低尿酸治疗(ULT)的痛风患者。患者的 12 个月基线期和治疗随访期被分割为 1 个月的单位。我们检索了有痛风诊断代码的月份的 EHR 笔记,并对笔记进行了 NLP 概念处理。我们随机选择了 500 名患者的样本,并审查了他们每个人的笔记,以确定是否有医生记录的痛风发作。包含至少 1 份提及痛风发作的笔记的月份被视为有事件的月份。我们使用 60%的患者使用 LASSO 训练预测模型。我们在验证数据中通过曲线下面积(AUC)评估模型,并检查阳性/阴性预测值(PPV/NPV)。
我们提取并标记了 839 个月的随访(280 个月有痛风发作)。仅索赔模型选择了 20 个变量(AUC=0.69)。仅 NLP 概念模型选择了 15 个(AUC=0.69)。综合模型选择了 32 个索赔变量和 13 个 NLP 概念(AUC=0.73)。仅索赔模型的 PPV 为 0.64 [0.50, 0.77],NPV 为 0.71 [0.65, 0.76],而综合模型的 PPV 为 0.76 [0.61, 0.88],NPV 为 0.71 [0.65, 0.76]。
将 NLP 概念变量添加到索赔变量中可以略微提高痛风发作的识别率。我们的数据驱动的仅索赔模型和综合的索赔/NLP 概念模型优于依赖药物使用、诊断和程序代码的现有基于规则的索赔算法。