Kaiser Permanente Southern California, Pasadena.
Arthritis Care Res (Hoboken). 2014 Nov;66(11):1740-8. doi: 10.1002/acr.22324.
Gout flares are not well documented by diagnosis codes, making it difficult to conduct accurate database studies. We implemented a computer-based method to automatically identify gout flares using natural language processing (NLP) and machine learning (ML) from electronic clinical notes.
Of 16,519 patients, 1,264 and 1,192 clinical notes from 2 separate sets of 100 patients were selected as the training and evaluation data sets, respectively, which were reviewed by rheumatologists. We created separate NLP searches to capture different aspects of gout flares. For each note, the NLP search outputs became the ML system inputs, which provided the final classification decisions. The note-level classifications were grouped into patient-level gout flares. Our NLP+ML results were validated using a gold standard data set and compared with the claims-based method used by prior literatures.
For 16,519 patients with a diagnosis of gout and a prescription for a urate-lowering therapy, we identified 18,869 clinical notes as gout flare positive (sensitivity 82.1%, specificity 91.5%): 1,402 patients with ≥3 flares (sensitivity 93.5%, specificity 84.6%), 5,954 with 1 or 2 flares, and 9,163 with no flare (sensitivity 98.5%, specificity 96.4%). Our method identified more flare cases (18,869 versus 7,861) and patients with ≥3 flares (1,402 versus 516) when compared to the claims-based method.
We developed a computer-based method (NLP and ML) to identify gout flares from the clinical notes. Our method was validated as an accurate tool for identifying gout flares with higher sensitivity and specificity compared to previous studies.
痛风发作的诊断代码并未得到很好的记录,因此难以进行准确的数据库研究。我们采用基于计算机的方法,使用自然语言处理(NLP)和机器学习(ML)从电子临床记录中自动识别痛风发作。
在 16519 名患者中,选择了来自两组各 100 名患者的 1264 份和 1192 份临床记录作为训练和评估数据集,这些记录均由风湿病专家进行了审查。我们创建了单独的 NLP 搜索来捕获痛风发作的不同方面。对于每份记录,NLP 搜索的输出结果成为 ML 系统的输入,最终提供分类决策。将记录级别的分类结果汇总为患者级别的痛风发作。我们使用黄金标准数据集验证了我们的 NLP+ML 结果,并与之前文献中使用的基于索赔的方法进行了比较。
对于 16519 名诊断为痛风且开具了尿酸降低治疗药物的患者,我们在 18869 份临床记录中识别出痛风发作阳性(敏感性 82.1%,特异性 91.5%):1402 名患者有≥3 次发作(敏感性 93.5%,特异性 84.6%),5954 名患者有 1 次或 2 次发作,9163 名患者无发作(敏感性 98.5%,特异性 96.4%)。与基于索赔的方法相比,我们的方法识别出了更多的发作病例(18869 例与 7861 例)和有≥3 次发作的患者(1402 例与 516 例)。
我们开发了一种基于计算机的方法(NLP 和 ML),从临床记录中识别痛风发作。与之前的研究相比,我们的方法被验证为一种准确的识别痛风发作的工具,具有更高的敏感性和特异性。