Xu Yuan, Lee Seungwon, Martin Elliot, D'souza Adam G, Doktorchik Chelsea T A, Jiang Jason, Lee Sangmin, Eastwood Cathy A, Fine Nowell, Hemmelgarn Brenda, Todd Kathryn, Quan Hude
Department of Oncology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada.
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Alberta Health Services, Calgary, Alberta, Canada.
J Card Fail. 2020 Jul;26(7):610-617. doi: 10.1016/j.cardfail.2020.04.003. Epub 2020 Apr 15.
Surveillance and outcome studies for heart failure (HF) require accurate identification of patients with HF. Algorithms based on International Classification of Diseases (ICD) codes to identify HF from administrative data are inadequate owing to their relatively low sensitivity. Detailed clinical information from electronic medical records (EMRs) is potentially useful for improving ICD algorithms. This study aimed to enhance the ICD algorithm for HF definition by incorporating comprehensive information from EMRs.
The study included 2106 inpatients in Calgary, Alberta, Canada. Medical chart review was used as the reference gold standard for evaluating developed algorithms. The commonly used ICD codes for defining HF were used (namely, the ICD algorithm). The performance of different algorithms using the free text discharge summaries from a population-based EMR were compared with the ICD algorithm. These algorithms included a keyword search algorithm looking for HF-specific terms, a machine learning-based HF concept (HFC) algorithm, an EMR structured data based algorithm, and combined algorithms (the ICD and HFC combined algorithm).
Of 2106 patients, 296 (14.1%) were patients with HF as determined by chart review. The ICD algorithm had 92.4% positive predictive value (PPV) but low sensitivity (57.4%). The EMR keyword search algorithm achieved a higher sensitivity (65.5%) than the ICD algorithm, but with a lower PPV (77.6%). The HFC algorithm achieved a better sensitivity (80.0%) and maintained a reasonable PPV (88.9%) compared with the ICD algorithm and the keyword algorithm. An even higher sensitivity (83.3%) was reached by combining the HFC and ICD algorithms, with a lower PPV (83.3%). The structured EMR data algorithm reached a sensitivity of 78% and a PPV of 54.2%. The combined EMR structured data and ICD algorithm had a higher sensitivity (82.4%), but the PPV remained low at 54.8%. All algorithms had a specificity ranging from 87.5% to 99.2%.
Applying natural language processing and machine learning on the discharge summaries of inpatient EMR data can improve the capture of cases of HF compared with the widely used ICD algorithm. The utility of the HFC algorithm is straightforward, making it easily applied for HF case identification.
心力衰竭(HF)的监测和结局研究需要准确识别HF患者。基于国际疾病分类(ICD)编码从管理数据中识别HF的算法由于其相对较低的敏感性而不够完善。电子病历(EMR)中的详细临床信息可能有助于改进ICD算法。本研究旨在通过纳入EMR中的综合信息来增强用于HF定义的ICD算法。
该研究纳入了加拿大艾伯塔省卡尔加里市的2106名住院患者。病历审查被用作评估所开发算法的参考金标准。使用了用于定义HF的常用ICD编码(即ICD算法)。将使用基于人群的EMR中的自由文本出院小结的不同算法的性能与ICD算法进行比较。这些算法包括寻找HF特定术语的关键词搜索算法、基于机器学习的HF概念(HFC)算法、基于EMR结构化数据的算法以及组合算法(ICD和HFC组合算法)。
在2106名患者中,经病历审查确定有296名(14.1%)为HF患者。ICD算法的阳性预测值(PPV)为92.4%,但敏感性较低(57.4%)。EMR关键词搜索算法的敏感性(65.5%)高于ICD算法,但PPV较低(77.6%)。与ICD算法和关键词算法相比,HFC算法具有更好的敏感性(80.0%)并保持了合理的PPV(88.9%)。将HFC和ICD算法相结合可达到更高的敏感性(83.3%),但PPV较低(83.3%)。结构化EMR数据算法的敏感性达到78%,PPV为54.2%。EMR结构化数据与ICD组合算法具有更高的敏感性(82.4%),但PPV仍然较低,为54.8%。所有算法的特异性范围为87.5%至99.2%。
与广泛使用的ICD算法相比,对住院患者EMR数据的出院小结应用自然语言处理和机器学习可以改善HF病例的捕获。HFC算法的实用性直接明了,使其易于应用于HF病例识别。