一种基于概率信息检索模型和统一医学语言系统（UMLS）概念的生物医学问答中的段落检索方法。

A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering.

作者信息

Sarrouti Mourad, Ouatik El Alaoui Said

机构信息

Laboratory of Computer Science and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco.

出版信息

J Biomed Inform. 2017 Apr;68:96-103. doi: 10.1016/j.jbi.2017.03.001. Epub 2017 Mar 7.

DOI:10.1016/j.jbi.2017.03.001

PMID:28286031

Abstract

BACKGROUND AND OBJECTIVE

Passage retrieval, the identification of top-ranked passages that may contain the answer for a given biomedical question, is a crucial component for any biomedical question answering (QA) system. Passage retrieval in open-domain QA is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in biomedical QA. In this paper, we present a new biomedical passage retrieval method based on Stanford CoreNLP sentence/passage length, probabilistic information retrieval (IR) model and UMLS concepts.

METHODS

In the proposed method, we first use our document retrieval system based on PubMed search engine and UMLS similarity to retrieve relevant documents to a given biomedical question. We then take the abstracts from the retrieved documents and use Stanford CoreNLP for sentence splitter to make a set of sentences, i.e., candidate passages. Using stemmed words and UMLS concepts as features for the BM25 model, we finally compute the similarity scores between the biomedical question and each of the candidate passages and keep the N top-ranked ones.

RESULTS

Experimental evaluations performed on large standard datasets, provided by the BioASQ challenge, show that the proposed method achieves good performances compared with the current state-of-the-art methods. The proposed method significantly outperforms the current state-of-the-art methods by an average of 6.84% in terms of mean average precision (MAP).

CONCLUSION

We have proposed an efficient passage retrieval method which can be used to retrieve relevant passages in biomedical QA systems with high mean average precision.

摘要

背景与目的

段落检索，即识别可能包含给定生物医学问题答案的排名靠前的段落，是任何生物医学问答（QA）系统的关键组成部分。开放域QA中的段落检索是过去几十年来广泛研究的一个长期挑战。然而，在生物医学QA中仍需要进一步努力。在本文中，我们提出了一种基于斯坦福CoreNLP句子/段落长度、概率信息检索（IR）模型和统一医学语言系统（UMLS）概念的新型生物医学段落检索方法。

方法

在所提出的方法中，我们首先使用基于PubMed搜索引擎和UMLS相似度的文档检索系统，来检索与给定生物医学问题相关的文档。然后，我们从检索到的文档中提取摘要，并使用斯坦福CoreNLP进行句子拆分，以形成一组句子，即候选段落。我们最终使用词干和UMLS概念作为BM25模型的特征，计算生物医学问题与每个候选段落之间的相似度得分，并保留排名前N的段落。

结果

在BioASQ挑战赛提供的大型标准数据集上进行的实验评估表明，与当前的最先进方法相比，所提出的方法具有良好的性能。在所提出的方法在平均平均精度（MAP）方面比当前的最先进方法显著高出6.84%。

结论

我们提出了一种有效的段落检索方法，该方法可用于在生物医学QA系统中以高平均平均精度检索相关段落。

相似文献

A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering.

J Biomed Inform. 2017 Apr;68:96-103. doi: 10.1016/j.jbi.2017.03.001. Epub 2017 Mar 7.

SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.

Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.

BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

UMLS knowledge for biomedical language processing.

Bull Med Libr Assoc. 1993 Apr;81(2):184-94.

A knowledge based method for the medical question answering problem.

Comput Biol Med. 2007 Oct;37(10):1511-21. doi: 10.1016/j.compbiomed.2007.01.013. Epub 2007 Mar 19.

Word embeddings and external resources for answer processing in biomedical factoid question answering.

J Biomed Inform. 2019 Apr;92:103118. doi: 10.1016/j.jbi.2019.103118. Epub 2019 Feb 10.

A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering.

Methods Inf Med. 2017 May 18;56(3):209-216. doi: 10.3414/ME16-01-0116. Epub 2017 Mar 31.

Deep learning-based approach for Arabic open domain question answering.

PeerJ Comput Sci. 2022 May 4;8:e952. doi: 10.7717/peerj-cs.952. eCollection 2022.

Bayesian approach to incorporating different types of biomedical knowledge bases into information retrieval systems for clinical decision support in precision medicine.

J Biomed Inform. 2019 Oct;98:103238. doi: 10.1016/j.jbi.2019.103238. Epub 2019 Jul 10.

Words or concepts: the features of indexing units and their optimal use in information retrieval.

Proc Annu Symp Comput Appl Med Care. 1993:685-9.

引用本文的文献

Question answering systems for health professionals at the point of care-a systematic review.

J Am Med Inform Assoc. 2024 Apr 3;31(4):1009-1024. doi: 10.1093/jamia/ocae015.

Diversity Learning Based on Multi-Latent Space for Medical Image Visual Question Generation.

Sensors (Basel). 2023 Jan 17;23(3):1057. doi: 10.3390/s23031057.

A reproducible experimental survey on biomedical sentence similarity: A string-based method sets the state of the art.

PLoS One. 2022 Nov 21;17(11):e0276539. doi: 10.1371/journal.pone.0276539. eCollection 2022.

Towards a unified search: Improving PubMed retrieval with full text.

J Biomed Inform. 2022 Oct;134:104211. doi: 10.1016/j.jbi.2022.104211. Epub 2022 Sep 21.

HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey.

BMC Bioinformatics. 2022 Jan 6;23(1):23. doi: 10.1186/s12859-021-04539-0.

Protocol for a reproducible experimental survey on biomedical sentence similarity.

PLoS One. 2021 Mar 24;16(3):e0248663. doi: 10.1371/journal.pone.0248663. eCollection 2021.

Survey on evaluation methods for dialogue systems.

Artif Intell Rev. 2021;54(1):755-810. doi: 10.1007/s10462-020-09866-x. Epub 2020 Jun 25.

List-wise learning to rank biomedical question-answer pairs with deep ranking recursive autoencoders.

PLoS One. 2020 Nov 9;15(11):e0242061. doi: 10.1371/journal.pone.0242061. eCollection 2020.

Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records.

BMC Med Inform Decis Mak. 2020 Apr 30;20(Suppl 1):73. doi: 10.1186/s12911-020-1044-0.

LitSense: making sense of biomedical literature at sentence level.

Nucleic Acids Res. 2019 Jul 2;47(W1):W594-W599. doi: 10.1093/nar/gkz289.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于概率信息检索模型和统一医学语言系统（UMLS）概念的生物医学问答中的段落检索方法。

A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering.

作者信息

机构信息

出版信息

BACKGROUND AND OBJECTIVE

METHODS

RESULTS

CONCLUSION

背景与目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献