Ghent University Hospital, Department of Intensive Care Medicine, Belgium.
Ghent University Hospital, Department of Intensive Care Medicine, Belgium; Ghent University, Faculty of Medicine and Health Sciences, Belgium.
J Crit Care. 2020 Apr;56:203-207. doi: 10.1016/j.jcrc.2020.01.007. Epub 2020 Jan 9.
Identification of patients for epidemiologic research through administrative coding has important limitations. We investigated the feasibility of a search based on natural language processing (NLP) on the text sections of electronic health records for identification of patients with septic shock.
Results of an explicit search strategy (using explicit concept retrieval) and a combined search strategy (using both explicit and implicit concept retrieval) were compared to hospital ICD-9 based administrative coding and to our department's own prospectively compiled infection database.
Of 8911 patients admitted to the medical or surgical ICU, 1023 (11.5%) suffered from septic shock according to the combined search strategy. This was significantly more than those identified by the explicit strategy (518, 5.8%), by hospital administrative coding (549, 5.8%) or by our own prospectively compiled database (609, 6.8%) (p < .001). Sensitivity and specificity of the automated combined search strategy were 72.7% (95%CI 69.0%-76.2%) and 93.0% (95%CI 92.4%-93.6%), compared to 56.0% (95%CI 52.0%-60.0%) and 97.5% (95%CI 97.1%-97.8%) for hospital administrative coding.
An automated search strategy based on a combination of explicit and implicit concept retrieval is feasible to screen electronic health records for septic shock and outperforms an administrative coding based explicit approach.
通过行政编码识别流行病学研究中的患者存在重要局限性。我们研究了基于自然语言处理(NLP)的电子健康记录文本部分搜索来识别感染性休克患者的可行性。
明确搜索策略(使用明确的概念检索)和联合搜索策略(使用明确和隐含的概念检索)的结果与医院 ICD-9 基于行政编码和我们部门的前瞻性感染数据库进行了比较。
在入住内科或外科 ICU 的 8911 名患者中,根据联合搜索策略,有 1023 名(11.5%)患有感染性休克。这明显多于明确策略(518 名,5.8%)、医院行政编码(549 名,5.8%)或我们部门前瞻性感染数据库(609 名,6.8%)识别的患者(p<.001)。自动联合搜索策略的敏感性和特异性分别为 72.7%(95%CI 69.0%-76.2%)和 93.0%(95%CI 92.4%-93.6%),而医院行政编码的敏感性和特异性分别为 56.0%(95%CI 52.0%-60.0%)和 97.5%(95%CI 97.1%-97.8%)。
基于明确和隐含概念检索相结合的自动搜索策略可用于筛选电子健康记录中的感染性休克,并优于基于行政编码的明确方法。