Friedman Carol, Shagina Lyudmila, Lussier Yves, Hripcsak George
Department of Biomedical Informatics, Columbia University, 622 West 168 Street, VC-5, New York, NY 10032, USA.
J Am Med Inform Assoc. 2004 Sep-Oct;11(5):392-402. doi: 10.1197/jamia.M1552. Epub 2004 Jun 7.
The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method.
An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts.
Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91.
Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.
本研究旨在开发一种基于自然语言处理(NLP)的方法,该方法能自动将整个临床文档映射为带有修饰符的编码,并对该方法进行定量评估。
对现有的NLP系统MedLEE进行调整,以自动生成编码。该方法包括将MedLEE生成的由发现和修饰符组成的结构化输出进行匹配,以获得最具体的编码。在两项独立研究中,对应用于统一医学语言系统(UMLS)编码的召回率和精确率进行了评估。使用150个随机选择的句子组成的测试集来测量召回率,这些句子使用MedLEE进行处理。将结果与由七位专家手动确定的参考标准进行比较。使用另一个由150个随机选择的句子组成的测试集来测量精确率,通过该方法自动生成UMLS编码,然后由专家进行验证。
该系统对所有术语进行UMLS编码的召回率为0.77(95%置信区间0.72 - 0.81),对于有相应UMLS编码的术语,召回率为0.83(0.79 - 0.87)。该系统提取所有术语的召回率为0.84(0.81 - 0.88)。专家提取术语的召回率范围为0.69至0.91。该系统的精确率为0.89(0.87 - 0.91),专家的精确率范围为0.61至0.91。
使用基于NLP的方法完成了相关临床信息的提取和UMLS编码。该方法似乎与六位专家相当或优于他们。该方法的优点是它将文本与其他相关信息一起映射为编码,使编码输出适合有效检索。