University of Wisconsin-Milwaukee, 2400 Hartford Avenue, Milwaukee, WI 53201, USA.
J Biomed Inform. 2010 Dec;43(6):962-71. doi: 10.1016/j.jbi.2010.07.007. Epub 2010 Jul 27.
Clinicians pose complex clinical questions when seeing patients, and identifying the answers to those questions in a timely manner helps improve the quality of patient care. We report here on two natural language processing models, namely, automatic topic assignment and keyword identification, that together automatically and effectively extract information needs from ad hoc clinical questions. Our study is motivated in the context of developing the larger clinical question answering system AskHERMES (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS).
We developed supervised machine-learning systems to automatically assign predefined general categories (e.g. etiology, procedure, and diagnosis) to a question. We also explored both supervised and unsupervised systems to automatically identify keywords that capture the main content of the question.
We evaluated our systems on 4654 annotated clinical questions that were collected in practice. We achieved an F1 score of 76.0% for the task of general topic classification and 58.0% for keyword extraction. Our systems have been implemented into the larger question answering system AskHERMES. Our error analyses suggested that inconsistent annotation in our training data have hurt both question analysis tasks.
Our systems, available at http://www.askhermes.org, can automatically extract information needs from both short (the number of word tokens <20) and long questions (the number of word tokens >20), and from both well-structured and ill-formed questions. We speculate that the performance of general topic classification and keyword extraction can be further improved if consistently annotated data are made available.
临床医生在为患者看病时会提出复杂的临床问题,及时找到这些问题的答案有助于提高患者的护理质量。在此,我们报告了两个自然语言处理模型,即自动主题分配和关键词识别,它们共同自动有效地从特定于临床的问题中提取信息需求。我们的研究是在开发更大的临床问题回答系统 AskHERMES(帮助临床医生提取和表达多媒体信息以回答临床问题)的背景下进行的。
我们开发了监督机器学习系统,自动将预定义的一般类别(例如病因、程序和诊断)分配给问题。我们还探索了监督和无监督系统,以自动识别捕获问题主要内容的关键词。
我们在实践中收集的 4654 个带注释的临床问题上评估了我们的系统。我们在一般主题分类任务中获得了 76.0%的 F1 分数,在关键词提取任务中获得了 58.0%的 F1 分数。我们的系统已经被实现到更大的问答系统 AskHERMES 中。我们的错误分析表明,训练数据中的不一致注释同时影响了这两个问题分析任务。
我们的系统(可在 http://www.askhermes.org 上获得)可以自动从短问题(词汇标记数<20)和长问题(词汇标记数>20)、结构良好的问题和结构不良的问题中提取信息需求。我们推测,如果提供一致注释的数据,一般主题分类和关键词提取的性能可以进一步提高。