University of Illinois Chicago, IL, USA.
LHNCBC, U.S. National Library of Medicine, MD, USA.
J Biomed Inform. 2022 Apr;128:104040. doi: 10.1016/j.jbi.2022.104040. Epub 2022 Mar 6.
Searching for health information online is becoming customary for more and more consumers every day, which makes the need for efficient and reliable question answering systems more pressing. An important contributor to the success rates of these systems is their ability to fully understand the consumers' questions. However, these questions are frequently longer than needed and mention peripheral information that is not useful in finding relevant answers. Question summarization is one of the potential solutions to simplifying long and complex consumer questions before attempting to find an answer. In this paper, we study the task of abstractive summarization for real-world consumer health questions. We develop an abstractive question summarization model that leverages the semantic interpretation of a question via recognition of medical entities, which enables generation of informative summaries. Towards this, we propose multiple Cloze tasks (i.e. the task of filing missing words in a given context) to identify the key medical entities that enforce the model to have better coverage in question-focus recognition. Additionally, we infuse the decoder inputs with question-type information to generate question-type driven summaries. When evaluated on the MeQSum benchmark corpus, our framework outperformed the state-of-the-art method by 10.2 ROUGE-L points. We also conducted a manual evaluation to assess the correctness of the generated summaries.
如今,越来越多的消费者每天都在上网搜索健康信息,这使得人们对高效、可靠的问答系统的需求变得更加迫切。这些系统的成功率的一个重要贡献因素是它们能够充分理解消费者的问题。然而,这些问题通常比实际需要的长,并且提到了一些在寻找相关答案时无用的外围信息。问题总结是简化长而复杂的消费者问题的潜在解决方案之一。在本文中,我们研究了真实世界消费者健康问题的抽象总结任务。我们开发了一个抽象的问题总结模型,通过识别医疗实体来利用问题的语义解释,从而生成信息丰富的摘要。为此,我们提出了多个 Cloze 任务(即识别给定上下文中缺失单词的任务)来识别关键的医疗实体,以使模型在问题焦点识别方面具有更好的覆盖范围。此外,我们在解码器输入中注入问题类型信息,以生成问题类型驱动的摘要。在 MeQSum 基准语料库上进行评估时,我们的框架在 ROUGE-L 得分上比最先进的方法高出 10.2 分。我们还进行了手动评估,以评估生成的摘要的正确性。