Kim Youngho, Choi Jinwook
Interdesciplinary Program of Bioengineering, College of Engineering, Seoul National University, Seoul, Korea.
Healthc Inform Res. 2011 Sep;17(3):150-5. doi: 10.4258/hir.2011.17.3.150. Epub 2011 Sep 30.
Acquiring temporal information is important because knowledge in clinical narratives is time-sensitive. In this paper, we describe an approach that can be used to extract the temporal information found in Korean clinical narrative texts.
We developed a two-stage system, which employs an exhaustive text analysis phase and a temporal expression recognition phase. Since our target document may include tokens that are made up of both Korean and English text joined together, the minimal semantic units are analyzed and then separated from the concatenated phrases and linguistic derivations within a token using a corpus-based approach to decompose complex tokens. A finite state machine is then used on the minimal semantic units in order to find phrases that possess time-related information.
In the experiment, the temporal expressions within Korean clinical narratives were extracted using our system. The system performance was evaluated through the use of 100 discharge summaries from Seoul National University Hospital containing a total of 805 temporal expressions. Our system scored a phrase-level precision and recall of 0.895 and 0.919, respectively.
Finding information in Korean clinical narrative is challenging task, since the text is written in both Korean and English and frequently omits syntactic elements and word spacing, which makes it extremely noisy. This study presents an effective method that can be used to aquire the temporal information found in Korean clinical documents.
获取时间信息很重要,因为临床叙述中的知识具有时间敏感性。在本文中,我们描述了一种可用于提取韩国临床叙述文本中时间信息的方法。
我们开发了一个两阶段系统,该系统采用详尽的文本分析阶段和时间表达识别阶段。由于我们的目标文档可能包含由韩语和英语文本组合而成的词元,因此先分析最小语义单元,然后使用基于语料库的方法将其与词元内的连接短语和语言派生成分分离,以分解复杂词元。然后在最小语义单元上使用有限状态机来查找具有时间相关信息的短语。
在实验中,使用我们的系统提取了韩国临床叙述中的时间表达。通过使用来自首尔国立大学医院的100份出院小结对系统性能进行评估,这些小结总共包含805个时间表达。我们的系统在短语级别的精确率和召回率分别为0.895和0.919。
在韩国临床叙述中查找信息是一项具有挑战性的任务,因为文本同时用韩语和英语书写,并且经常省略句法元素和单词间距,这使其噪声极大。本研究提出了一种可用于获取韩国临床文档中时间信息的有效方法。