使用自动文档分类从德语叙述性临床文本文件中增强信息检索。

Enhanced information retrieval from narrative German-language clinical text documents using automated document classification.

作者信息

Spat Stephan, Cadonna Bruno, Rakovac Ivo, Gütl Christian, Leitner Hubert, Stark Günther, Beck Peter

机构信息

Institute of Medical Technologies and Health Management, Joanneum Research Forschungsgesellschaft mbH, Graz, Austria.

出版信息

Stud Health Technol Inform. 2008;136:473-8.

PMID:18487776

Abstract

The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This paper describes the prototype of a medical information retrieval system (MIRS) for clinical text documents. The open-source information retrieval framework Apache Lucene has been used to implement the prototype of the MIRS. Additionally, a multi-label classification system based on the open-source data mining framework WEKA generates metadata from the clinical text document set. The metadata is used for influencing the rank order of documents retrieved by physicians. Combining information retrieval and automated document classification offers an enhanced approach to let physicians and in the near future patients define their information needs for information stored in an EPR. The system has been designed as a J2EE Web-application. First findings are based on a sample of 18,000 unstructured, clinical text documents written in German.

摘要

医院信息系统电子病历（EPR）中存储的叙述性临床文本文件数量正在增加。医生在这些临床文本文件中花费大量时间查找与患者相关的信息以进行医疗决策。因此，高效且有针对性地检索相关患者信息是EPR系统中的一项重要任务。本文描述了一个针对临床文本文件的医学信息检索系统（MIRS）的原型。开源信息检索框架Apache Lucene已被用于实现MIRS的原型。此外，基于开源数据挖掘框架WEKA的多标签分类系统从临床文本文件集中生成元数据。该元数据用于影响医生检索到的文档的排名顺序。将信息检索与自动文档分类相结合，提供了一种增强的方法，使医生以及在不久的将来患者能够定义他们对EPR中存储信息的信息需求。该系统被设计为一个J2EE Web应用程序。初步研究结果基于18000份用德语编写的非结构化临床文本文件样本。