Suppr超能文献

SemBioNLQA:一个语义生物医学问答系统,用于检索自然语言问题的准确和理想答案。

SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.

机构信息

Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda, MD.

National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco.

出版信息

Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.

Abstract

BACKGROUND AND OBJECTIVE

Question answering (QA), the identification of short accurate answers to users questions written in natural language expressions, is a longstanding issue widely studied over the last decades in the open-domain. However, it still remains a real challenge in the biomedical domain as the most of the existing systems support a limited amount of question and answer types as well as still require further efforts in order to improve their performance in terms of precision for the supported questions. Here, we present a semantic biomedical QA system named SemBioNLQA which has the ability to handle the kinds of yes/no, factoid, list, and summary natural language questions.

METHODS

This paper describes the system architecture and an evaluation of the developed end-to-end biomedical QA system named SemBioNLQA, which consists of question classification, document retrieval, passage retrieval and answer extraction modules. It takes natural language questions as input, and outputs both short precise answers and summaries as results. The SemBioNLQA system, dealing with four types of questions, is based on (1) handcrafted lexico-syntactic patterns and a machine learning algorithm for question classification, (2) PubMed search engine and UMLS similarity for document retrieval, (3) the BM25 model, stemmed words and UMLS concepts for passage retrieval, and (4) UMLS metathesaurus, BioPortal synonyms, sentiment analysis and term frequency metric for answer extraction.

RESULTS AND CONCLUSION

Compared with the current state-of-the-art biomedical QA systems, SemBioNLQA, a fully automated system, has the potential to deal with a large amount of question and answer types. SemBioNLQA retrieves quickly users' information needs by returning exact answers (e.g., "yes", "no", a biomedical entity name, etc.) and ideal answers (i.e., paragraph-sized summaries of relevant information) for yes/no, factoid and list questions, whereas it provides only the ideal answers for summary questions. Moreover, experimental evaluations performed on biomedical questions and answers provided by the BioASQ challenge especially in 2015, 2016 and 2017 (as part of our participation), show that SemBioNLQA achieves good performances compared with the most current state-of-the-art systems and allows a practical and competitive alternative to help information seekers find exact and ideal answers to their biomedical questions. The SemBioNLQA source code is publicly available at https://github.com/sarrouti/sembionlqa.

摘要

背景与目的

问答(QA)是指用自然语言表达的用户问题的简短准确答案的识别,这是一个在过去几十年中在开放领域得到广泛研究的长期问题。然而,在生物医学领域,它仍然是一个真正的挑战,因为大多数现有的系统仅支持有限数量的问题和答案类型,并且仍然需要进一步努力以提高其在支持问题的精度方面的性能。在这里,我们提出了一个名为 SemBioNLQA 的语义生物医学 QA 系统,它具有处理是/否、事实、列表和摘要等自然语言问题的能力。

方法

本文描述了所开发的端到端生物医学 QA 系统 SemBioNLQA 的系统架构和评估,该系统由问题分类、文档检索、文章检索和答案提取模块组成。它以自然语言问题作为输入,输出简短而准确的答案和摘要作为结果。处理四种类型问题的 SemBioNLQA 系统基于(1)手工制作的词法和句法模式和机器学习算法进行问题分类,(2)PubMed 搜索引擎和 UMLS 相似性进行文档检索,(3)BM25 模型、词干词和 UMLS 概念进行文章检索,以及(4)UMLS 元词表、BioPortal 同义词、情感分析和术语频率度量进行答案提取。

结果与结论

与当前最先进的生物医学 QA 系统相比,SemBioNLQA 是一个完全自动化的系统,具有处理大量问题和答案类型的潜力。SemBioNLQA 通过返回准确的答案(例如,“是”、“否”、生物医学实体名称等)和针对是/否、事实和列表问题的理想答案(即相关信息的段落大小摘要)快速检索用户的信息需求,而对于摘要问题仅提供理想答案。此外,在生物 ASQ 挑战赛中提供的生物医学问题和答案的实验评估,特别是在 2015 年、2016 年和 2017 年(作为我们参与的一部分)中,表明 SemBioNLQA 与当前最先进的系统相比取得了良好的性能,并为信息搜索者提供了一种实用且具有竞争力的替代方法,以帮助他们找到生物医学问题的确切和理想答案。SemBioNLQA 的源代码可在 https://github.com/sarrouti/sembionlqa 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验