Suppr超能文献

问题蕴涵方法在问答中的应用。

A question-entailment approach to question answering.

机构信息

Lister Hill Center, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda, MD, USA.

出版信息

BMC Bioinformatics. 2019 Oct 22;20(1):511. doi: 10.1186/s12859-019-3119-4.

Abstract

BACKGROUND

One of the challenges in large-scale information retrieval (IR) is developing fine-grained and domain-specific methods to answer natural language questions. Despite the availability of numerous sources and datasets for answer retrieval, Question Answering (QA) remains a challenging problem due to the difficulty of the question understanding and answer extraction tasks. One of the promising tracks investigated in QA is mapping new questions to formerly answered questions that are "similar".

RESULTS

We propose a novel QA approach based on Recognizing Question Entailment (RQE) and we describe the QA system and resources that we built and evaluated on real medical questions. First, we compare logistic regression and deep learning methods for RQE using different kinds of datasets including textual inference, question similarity, and entailment in both the open and clinical domains. Second, we combine IR models with the best RQE method to select entailed questions and rank the retrieved answers. To study the end-to-end QA approach, we built the MedQuAD collection of 47,457 question-answer pairs from trusted medical sources which we introduce and share in the scope of this paper. Following the evaluation process used in TREC 2017 LiveQA, we find that our approach exceeds the best results of the medical task with a 29.8% increase over the best official score.

CONCLUSIONS

The evaluation results support the relevance of question entailment for QA and highlight the effectiveness of combining IR and RQE for future QA efforts. Our findings also show that relying on a restricted set of reliable answer sources can bring a substantial improvement in medical QA.

摘要

背景

在大规模信息检索(IR)中,面临的挑战之一是开发细粒度和特定于领域的方法来回答自然语言问题。尽管有大量的来源和数据集可用于答案检索,但由于问题理解和答案提取任务的难度,问答(QA)仍然是一个具有挑战性的问题。在 QA 中,一个有前途的研究方向是将新问题映射到以前回答过的“相似”问题上。

结果

我们提出了一种基于识别问题蕴涵(RQE)的新 QA 方法,并描述了我们在真实医学问题上构建和评估的 QA 系统和资源。首先,我们使用不同类型的数据集(包括文本推理、问题相似性和蕴涵)比较了逻辑回归和深度学习方法在开放和临床领域的 RQE 中的应用。其次,我们将 IR 模型与最佳 RQE 方法相结合,以选择蕴涵问题并对检索到的答案进行排名。为了研究端到端 QA 方法,我们从可信的医学资源中构建了包含 47457 个问答对的 MedQuAD 数据集,我们在本文的范围内介绍并共享了该数据集。根据 TREC 2017 LiveQA 使用的评估过程,我们发现我们的方法超过了医学任务的最佳结果,比最佳官方分数提高了 29.8%。

结论

评估结果支持问题蕴涵对于 QA 的相关性,并强调了结合 IR 和 RQE 对于未来 QA 工作的有效性。我们的研究结果还表明,仅依靠一组受限制的可靠答案来源就可以大大提高医学 QA 的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a08/6805558/8292b21c6bef/12859_2019_3119_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验