Stenzhorn Holger, Pacheco Edson José, Nohama Percy, Schulz Stefan
Institute for Medical Biometry and Medical Informatics, University Medical Center, 79104 Freiburg, Germany.
Stud Health Technol Inform. 2009;150:228-32.
Clinical documentation needs to be fine-grained to truthfully represent the history, development, and treatment of a patient. But natural language, as the main information carrier, is characterized by many issues, like idiosyncratic terminology, spelling and grammar errors, and a lack of grammatical structure. Therefore coding systems, like ICD-10, have been introduced, but their use varies highly among physicians, and they are often used incompletely or incorrectly. The almost exponential growth of clinical data is yet another problem. We present a new methodology to process this data: Through combining several natural language processing methods we extract morphemes from clinical texts and map them onto concepts from SNOMED CT. We first performed a manual analysis of clinical texts received from a university hospital and evaluated the issues found in them. Based on this we implemented a prototypical system which incorporates both the OpenNLP and the MorphoSaurus natural language processing systems.
临床文档需要细致入微,以便如实呈现患者的病史、病情发展和治疗情况。但作为主要信息载体的自然语言存在诸多问题,如术语独特、拼写和语法错误以及缺乏语法结构。因此,像国际疾病分类第十版(ICD - 10)这样的编码系统应运而生,但其在医生中的使用差异很大,而且常常使用不完整或不正确。临床数据几乎呈指数级增长是另一个问题。我们提出一种处理此数据的新方法:通过结合多种自然语言处理方法,我们从临床文本中提取词素,并将它们映射到医学系统命名法(SNOMED CT)中的概念。我们首先对从一家大学医院收到的临床文本进行了人工分析,并评估其中发现的问题。在此基础上,我们实现了一个原型系统,该系统整合了OpenNLP和MorphoSaurus自然语言处理系统。