Suppr超能文献

一种基于Transformer架构的新冠病毒搜索引擎(CO-SE)。

A COVID-19 Search Engine (CO-SE) with Transformer-based architecture.

作者信息

Raza Shaina

机构信息

Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

出版信息

Healthc Anal (N Y). 2022 Nov;2:100068. doi: 10.1016/j.health.2022.100068. Epub 2022 Jun 6.

Abstract

Coronavirus disease (COVID-19) is an infectious disease, which is caused by the SARS-CoV-2 virus. Due to the growing literature on COVID-19, it is hard to get precise, up-to-date information about the virus. Practitioners, front-line workers, and researchers require expert-specific methods to stay current on scientific knowledge and research findings. However, there are a lot of research papers being written on the subject, which makes it hard to keep up with the most recent research. This problem motivates us to propose the design of the COVID-19 Search Engine (CO-SE), which is an algorithmic system that finds relevant documents for each query (asked by a user) and answers complex questions by searching a large corpus of publications. The CO-SE has a retriever component trained on the TF-IDF vectorizer that retrieves the relevant documents from the system. It also consists of a reader component that consists of a Transformer-based model, which is used to read the paragraphs and find the answers related to the query from the retrieved documents. The proposed model has outperformed previous models, obtaining an exact match ratio score of 71.45% and a semantic answer similarity score of 78.55%. It also outperforms other benchmark datasets, demonstrating the generalizability of the proposed approach.

摘要

冠状病毒病(COVID-19)是一种由严重急性呼吸综合征冠状病毒2(SARS-CoV-2)病毒引起的传染病。由于关于COVID-19的文献越来越多,很难获得有关该病毒的精确、最新信息。从业者、一线工作者和研究人员需要特定于专家的方法来跟上科学知识和研究成果。然而,关于这个主题正在撰写大量的研究论文,这使得跟上最新研究变得困难。这个问题促使我们提出COVID-19搜索引擎(CO-SE)的设计,它是一个算法系统,为每个查询(由用户提出)找到相关文档,并通过搜索大量出版物语料库来回答复杂问题。CO-SE有一个在TF-IDF向量器上训练的检索器组件,用于从系统中检索相关文档。它还包括一个阅读器组件,该组件由一个基于Transformer的模型组成,用于读取段落并从检索到的文档中找到与查询相关的答案。所提出的模型优于以前的模型,获得了71.45%的精确匹配率分数和78.55%的语义答案相似度分数。它也优于其他基准数据集,证明了所提出方法的通用性。

相似文献

2
COBERT: COVID-19 Question Answering System Using BERT.COBERT:使用BERT的COVID-19问答系统。
Arab J Sci Eng. 2021 Jun 23:1-11. doi: 10.1007/s13369-021-05810-5.

本文引用的文献

3
Data science approaches to confronting the COVID-19 pandemic: a narrative review.数据科学方法应对 COVID-19 大流行:叙事性综述。
Philos Trans A Math Phys Eng Sci. 2022 Jan 10;380(2214):20210127. doi: 10.1098/rsta.2021.0127. Epub 2021 Nov 22.
6
Long COVID, a comprehensive systematic scoping review.长新冠,一项全面的系统范围综述。
Infection. 2021 Dec;49(6):1163-1186. doi: 10.1007/s15010-021-01666-x. Epub 2021 Jul 28.
9
COBERT: COVID-19 Question Answering System Using BERT.COBERT:使用BERT的COVID-19问答系统。
Arab J Sci Eng. 2021 Jun 23:1-11. doi: 10.1007/s13369-021-05810-5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验