多阶段 BiCross 编码器，用于多语言访问 COVID-19 健康信息。

Multistage BiCross encoder for multilingual access to COVID-19 health information.

机构信息

Department of Computer Science, University of Sheffield, Sheffield, United Kingdom.

出版信息

PLoS One. 2021 Sep 7;16(9):e0256874. doi: 10.1371/journal.pone.0256874. eCollection 2021.

DOI:10.1371/journal.pone.0256874

PMID:34492073

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8423231/

Abstract

The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.

摘要

冠状病毒（COVID-19）大流行导致了在线健康信息的快速增长的“信息疫情”。这促使我们需要在数百万份文档中，使用多种语言，进行准确的语义搜索和检索可靠的 COVID-19 信息。为了解决这一挑战，本文提出了一种新颖的高精度和高召回神经多阶段 BiCross 编码器方法。它是一个顺序的三阶段排序管道，使用 Okapi BM25 检索算法和基于转换器的双编码器和交叉编码器，有效地根据给定的查询对文档进行排序。我们展示了我们在多语言信息访问（MLIA）共享任务中参与 COVID-19 多语言语义搜索的实验结果。独立评估的 MLIA 结果验证了我们的方法，并表明它在单语和双语运行的情况下，根据几乎所有评估指标，都优于其他最先进的方法。