Health Information Technology Research Laboratory, School of IT, Faculty of Engineering and IT, The University of Sydney, Sydney, NSW 2006, Australia.
J Biomed Inform. 2012 Apr;45(2):292-306. doi: 10.1016/j.jbi.2011.11.008. Epub 2011 Nov 28.
Many studies have been completed on question classification in the open domain, however only limited work focuses on the medical domain. As well, to the best of our knowledge, most of these medical question classifications were designed for literature based question and answering systems. This paper focuses on a new direction, which is to design a novel question processing and classification model for answering clinical questions applied to electronic patient notes.
There are four main steps in the work. Firstly, a relatively large set of clinical questions was collected from staff in an Intensive Care Unit. Then, a clinical question taxonomy was designed for question and answering purposes. Subsequently an annotation guideline was created and used to annotate the question set. Finally, a multilayer classification model was built to classify the clinical questions.
Through the initial classification experiments, we realized that the general features cannot contribute to high performance of a minimum classifier (a small data set with multiple classes). Thus, an automatic knowledge discovery and knowledge reuse process was designed to boost the performance by extracting and expanding the specific features of the questions. In the evaluation, the results show around 90% accuracy can be achieved in the answerable subclass classification and generic question templates classification. On the other hand, the machine learning method does not perform well at identifying the category of unanswerable questions, due to the asymmetric distribution.
In this paper, a comprehensive study on clinical questions has been completed. A major outcome of this work is the multilayer classification model. It serves as a major component of a patient records based clinical question and answering system as our studies continue. As well, the question collections can be reused by the research community to improve the efficiency of their own question and answering systems.
许多关于开放领域问题分类的研究已经完成,但只有有限的工作集中在医学领域。据我们所知,这些医学问题分类大多是为基于文献的问答系统设计的。本文专注于一个新的方向,即为应用于电子病历的回答临床问题设计一种新颖的问题处理和分类模型。
这项工作有四个主要步骤。首先,从重症监护病房的工作人员那里收集了一组相对较大的临床问题。然后,为问答目的设计了一个临床问题分类法。接着创建并使用注释指南来注释问题集。最后,构建了多层分类模型来对临床问题进行分类。
通过初步的分类实验,我们意识到一般特征不能为最小分类器(多类别小数据集)提供高性能。因此,设计了一个自动知识发现和知识重用过程,通过提取和扩展问题的特定特征来提高性能。在评估中,结果表明在可回答子类分类和通用问题模板分类中可以达到约 90%的准确率。另一方面,由于分布不对称,机器学习方法在识别不可回答问题的类别方面表现不佳。
本文对临床问题进行了全面研究。这项工作的主要成果是多层分类模型。它作为基于患者记录的临床问答系统的一个主要组成部分,随着我们的研究继续进行。此外,问题集可以被研究社区重复使用,以提高他们自己的问答系统的效率。