Hospices Civils de Lyon, Hôpital de la Croix-Rousse, Unité d'hygiène et d'épidémiologie, F-69317 Lyon, France.
BMC Med Inform Decis Mak. 2011 Jul 28;11:50. doi: 10.1186/1472-6947-11-50.
The identification of patients who pose an epidemic hazard when they are admitted to a health facility plays a role in preventing the risk of hospital acquired infection. An automated clinical decision support system to detect suspected cases, based on the principle of syndromic surveillance, is being developed at the University of Lyon's Hôpital de la Croix-Rousse. This tool will analyse structured data and narrative reports from computerized emergency department (ED) medical records. The first step consists of developing an application (UrgIndex) which automatically extracts and encodes information found in narrative reports. The purpose of the present article is to describe and evaluate this natural language processing system.
Narrative reports have to be pre-processed before utilizing the French-language medical multi-terminology indexer (ECMT) for standardized encoding. UrgIndex identifies and excludes syntagmas containing a negation and replaces non-standard terms (abbreviations, acronyms, spelling errors...). Then, the phrases are sent to the ECMT through an Internet connection. The indexer's reply, based on Extensible Markup Language, returns codes and literals corresponding to the concepts found in phrases. UrgIndex filters codes corresponding to suspected infections. Recall is defined as the number of relevant processed medical concepts divided by the number of concepts evaluated (coded manually by the medical epidemiologist). Precision is defined as the number of relevant processed concepts divided by the number of concepts proposed by UrgIndex. Recall and precision were assessed for respiratory and cutaneous syndromes.
Evaluation of 1,674 processed medical concepts contained in 100 ED medical records (50 for respiratory syndromes and 50 for cutaneous syndromes) showed an overall recall of 85.8% (95% CI: 84.1-87.3). Recall varied from 84.5% for respiratory syndromes to 87.0% for cutaneous syndromes. The most frequent cause of lack of processing was non-recognition of the term by UrgIndex (9.7%). Overall precision was 79.1% (95% CI: 77.3-80.8). It varied from 81.4% for respiratory syndromes to 77.0% for cutaneous syndromes.
This study demonstrates the feasibility of and interest in developing an automated method for extracting and encoding medical concepts from ED narrative reports, the first step required for the detection of potentially infectious patients at epidemic risk.
识别入住医疗机构时构成疫情风险的患者,对于预防医院获得性感染风险具有重要意义。里昂大学 Croix-Rousse 医院正在开发一种基于症状监测原理的自动临床决策支持系统,用于检测疑似病例。该工具将分析来自计算机化急诊科(ED)病历的结构化数据和叙述性报告。第一步是开发一个应用程序(UrgIndex),它能自动提取并编码叙述性报告中的信息。本文旨在描述和评估这个自然语言处理系统。
在使用法语医学多术语索引器(ECMT)进行标准化编码之前,叙述性报告必须进行预处理。UrgIndex 识别并排除包含否定词的句法结构,替换非标准术语(缩写、首字母缩略词、拼写错误等)。然后,这些短语通过互联网连接发送到 ECMT。索引器基于可扩展标记语言的回复返回与短语中发现的概念相对应的代码和文字。UrgIndex 过滤与疑似感染相对应的代码。召回率定义为处理的相关医学概念数量除以评估的概念数量(由医学流行病学家手动编码)。精确率定义为处理的相关概念数量除以 UrgIndex 提出的概念数量。对呼吸道和皮肤综合征的召回率和精确率进行了评估。
对 100 份 ED 病历(50 份用于呼吸道综合征,50 份用于皮肤综合征)中包含的 1674 个已处理医学概念进行评估,结果显示总体召回率为 85.8%(95%置信区间:84.1 - 87.3)。呼吸道综合征的召回率为 84.5%,皮肤综合征的召回率为 87.0%。处理失败的最常见原因是 UrgIndex 未识别该术语(9.7%)。总体精确率为 79.1%(95%置信区间:77.3 - 80.8)。呼吸道综合征的精确率为 81.4%,皮肤综合征的精确率为 77.0%。
本研究证明了开发一种从 ED 叙述性报告中提取和编码医学概念的自动化方法的可行性和意义,这是检测有疫情风险的潜在感染患者所需的第一步。