将自然语言处理工具包应用于电子健康记录——一份经验报告。

Applying natural language processing toolkits to electronic health records - an experience report.

作者信息

Barrett Neil, Weber-Jahnke Jens H

机构信息

Department of Computer Science, University of Victoria, Victoria, BC, Canada.

出版信息

Stud Health Technol Inform. 2009;143:441-6.

PMID:19380974

Abstract

A natural language challenge devised by Informatics for Integrating Biology and the Bedside (i2b2) was to analyze free-text health data to construct a multi-class, multi-label classification system focused on obesity and its co-morbidities. This report presents a case study in which a natural language processing (NLP) toolkit, called NLTK, was used in the challenge. This report provides a brief review of NLP in the context of EHR applications, briefly surveys and contrasts some existing NLP toolkits, and reports on our experiences with the i2b2 case study. Our efforts uncovered issues including the lack of human annotated physician notes for use as NLP training data, differences between conventional free-text and medical notes, and potential hardware and software limitations affecting future projects.

摘要

由生物与临床整合信息学（i2b2）设计的一项自然语言挑战是分析自由文本健康数据，以构建一个专注于肥胖及其合并症的多类别、多标签分类系统。本报告呈现了一个案例研究，其中在该挑战中使用了一个名为NLTK的自然语言处理（NLP）工具包。本报告在电子健康记录（EHR）应用的背景下对NLP进行了简要回顾，简要调查并对比了一些现有的NLP工具包，并报告了我们在i2b2案例研究中的经验。我们的工作发现了一些问题，包括缺乏用作NLP训练数据的人工标注医生笔记、传统自由文本与医疗笔记之间的差异，以及影响未来项目的潜在硬件和软件限制。