Rosier Arnaud, Burgun Anita, Mabo Philippe
School of Medicine, University of Rennes 1, IFR 140, Rennes, France.
AMIA Annu Symp Proc. 2008 Nov 6;2008:81-5.
This study evaluated natural language processing methods to extract clinical data from free text in surgical reports related to cardiac pacing and defibrillation in order to populate a registry.
The information extraction system that we have developed is a name entity recognition system based on GATE using regular expressions. 232 reports were analyzed. For each report, we performed manual abstraction, we collected the information stored in the registry, and we performed information extraction with our system. Sensitivity,positive predictive value and accuracy were used to evaluate our method.
Our system extracted information, including numeric data, text and combination of numbers and strings, with a high sensitivity (>90%) and very high predictive positive value (>95%). It featured a precision that was higher than the precision of the original registry database populated by manual input.Conclusion This tool based on GATE open source components provides a robust approach to extracting information from documents related to a specific narrow domain such as pacemaker reports. Further evaluation is needed for application to broader domains.
本研究评估了自然语言处理方法,以从与心脏起搏和除颤相关的手术报告中的自由文本中提取临床数据,以便填充一个注册库。
我们开发的信息提取系统是一个基于GATE并使用正则表达式的命名实体识别系统。分析了232份报告。对于每份报告,我们进行了人工摘要,收集了注册库中存储的信息,并使用我们的系统进行了信息提取。使用敏感性、阳性预测值和准确性来评估我们的方法。
我们的系统提取了包括数值数据、文本以及数字和字符串组合在内的信息,具有高敏感性(>90%)和非常高的阳性预测值(>95%)。其精确度高于通过人工输入填充的原始注册库数据库的精确度。结论:这个基于GATE开源组件的工具为从与特定狭窄领域(如起搏器报告)相关的文档中提取信息提供了一种强大的方法。在应用于更广泛领域时还需要进一步评估。