Pakhomov Serguei, Weston Susan A, Jacobsen Steven J, Chute Christopher G, Meverden Ryan, Roger Véronique L
Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
Am J Manag Care. 2007 Jun;13(6 Part 1):281-8.
To identify patients with heart failure (HF) by using language contained in the electronic medical record (EMR).
We validated 2 methods of identifying HF through the EMR, which offers transcription of clinical notes within 24 hours or less of the encounter. The first method was natural language processing (NLP) of the EMR text. The second method was predictive modeling based on machine learning, using the text of clinical reports. Natural language processing was compared with both manual record review and billing records. Predictive modeling was compared with manual record review.
Natural language processing identified 2904 HF cases; billing records independently identified 1684 HF cases, 252 (15%) of them not identified by NLP. Review of a random sample of these 252 cases did not identify HF, yielding 100% sensitivity (95% confidence interval [CI] = 86, 100) and 97.8% specificity (95% CI = 97.7, 97.9) for NLP. Manual review confirmed 1107 of the 2904 cases identified by NLP, yielding a positive predictive value (PPV) of 38% (95% CI = 36, 40). Predictive modeling yielded a PPV of 82% (95% CI = 73,93), 56% sensitivity (95% CI = 46, 67), and 96% specificity (95% CI = 94, 99).
The EMR can be used to identify HF via 2 complementary approaches. Natural language processing may be more suitable for studies requiring highest sensitivity, whereas predictive modeling may be more suitable for studies requiring higher PPV.
通过使用电子病历(EMR)中包含的语言来识别心力衰竭(HF)患者。
我们验证了两种通过EMR识别HF的方法,该系统可在就诊后24小时或更短时间内转录临床记录。第一种方法是对EMR文本进行自然语言处理(NLP)。第二种方法是基于机器学习的预测建模,使用临床报告文本。将自然语言处理与人工病历审查和计费记录进行比较。将预测建模与人工病历审查进行比较。
自然语言处理识别出2904例HF病例;计费记录独立识别出1684例HF病例,其中252例(15%)未被NLP识别。对这252例病例的随机样本进行审查未发现HF,NLP的敏感性为100%(95%置信区间[CI]=86,100),特异性为97.8%(95%CI=97.7,97.9)。人工审查确认了NLP识别出的2904例病例中的1107例,阳性预测值(PPV)为38%(95%CI=36,40)。预测建模的PPV为82%(95%CI=73,93),敏感性为56%(95%CI=46,67),特异性为96%(95%CI=94,99)。
EMR可通过两种互补方法用于识别HF。自然语言处理可能更适合需要最高敏感性的研究,而预测建模可能更适合需要更高PPV的研究。