Raza Shaina
Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
Healthc Anal (N Y). 2022 Nov;2:100068. doi: 10.1016/j.health.2022.100068. Epub 2022 Jun 6.
Coronavirus disease (COVID-19) is an infectious disease, which is caused by the SARS-CoV-2 virus. Due to the growing literature on COVID-19, it is hard to get precise, up-to-date information about the virus. Practitioners, front-line workers, and researchers require expert-specific methods to stay current on scientific knowledge and research findings. However, there are a lot of research papers being written on the subject, which makes it hard to keep up with the most recent research. This problem motivates us to propose the design of the COVID-19 Search Engine (CO-SE), which is an algorithmic system that finds relevant documents for each query (asked by a user) and answers complex questions by searching a large corpus of publications. The CO-SE has a retriever component trained on the TF-IDF vectorizer that retrieves the relevant documents from the system. It also consists of a reader component that consists of a Transformer-based model, which is used to read the paragraphs and find the answers related to the query from the retrieved documents. The proposed model has outperformed previous models, obtaining an exact match ratio score of 71.45% and a semantic answer similarity score of 78.55%. It also outperforms other benchmark datasets, demonstrating the generalizability of the proposed approach.
冠状病毒病(COVID-19)是一种由严重急性呼吸综合征冠状病毒2(SARS-CoV-2)病毒引起的传染病。由于关于COVID-19的文献越来越多,很难获得有关该病毒的精确、最新信息。从业者、一线工作者和研究人员需要特定于专家的方法来跟上科学知识和研究成果。然而,关于这个主题正在撰写大量的研究论文,这使得跟上最新研究变得困难。这个问题促使我们提出COVID-19搜索引擎(CO-SE)的设计,它是一个算法系统,为每个查询(由用户提出)找到相关文档,并通过搜索大量出版物语料库来回答复杂问题。CO-SE有一个在TF-IDF向量器上训练的检索器组件,用于从系统中检索相关文档。它还包括一个阅读器组件,该组件由一个基于Transformer的模型组成,用于读取段落并从检索到的文档中找到与查询相关的答案。所提出的模型优于以前的模型,获得了71.45%的精确匹配率分数和78.55%的语义答案相似度分数。它也优于其他基准数据集,证明了所提出方法的通用性。