Li Min, Patrick Jon
School of IT, the University of Sydney, Sydney, NSW, Australia.
AMIA Annu Symp Proc. 2012;2012:542-51. Epub 2012 Nov 3.
A method for automatic extraction of clinical temporal information would be of significant practical importance for deep medical language understanding, and a key to creating many successful applications, such as medical decision making, medical question and answering, etc. This paper proposes a rich statistical model for extracting temporal information from an extremely noisy clinical corpus. Besides the common linguistic, contextual and semantic features, the highly restricted training sample expansion and the structure distance between the temporal expression & related event expressions are also integrated into a supervised machine-learning approach. The learning method produces almost 80% F- score in the extraction of five temporal classes, and nearly 75% F-score in identifying temporally related events. This process has been integrated into the document-processing component of an implemented clinical question answering system that focuses on answering patient-specific questions (See demonstration at http://hitrl.cs.usyd.edu.au/ICNS/).
一种自动提取临床时间信息的方法对于深入理解医学语言具有重大的实际意义,并且是创建许多成功应用(如医疗决策、医学问答等)的关键。本文提出了一种丰富的统计模型,用于从噪声极大的临床语料库中提取时间信息。除了常见的语言、上下文和语义特征外,高度受限的训练样本扩展以及时间表达式与相关事件表达式之间的结构距离也被整合到一种监督式机器学习方法中。该学习方法在提取五个时间类别时产生了近80%的F值,在识别时间相关事件时产生了近75%的F值。此过程已被集成到一个已实现的临床问答系统的文档处理组件中,该系统专注于回答针对特定患者的问题(见演示:http://hitrl.cs.usyd.edu.au/ICNS/)。