Department of Computer Science, University of Sheffield, Sheffield, United Kingdom.
PLoS One. 2021 Sep 7;16(9):e0256874. doi: 10.1371/journal.pone.0256874. eCollection 2021.
The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.
冠状病毒(COVID-19)大流行导致了在线健康信息的快速增长的“信息疫情”。这促使我们需要在数百万份文档中,使用多种语言,进行准确的语义搜索和检索可靠的 COVID-19 信息。为了解决这一挑战,本文提出了一种新颖的高精度和高召回神经多阶段 BiCross 编码器方法。它是一个顺序的三阶段排序管道,使用 Okapi BM25 检索算法和基于转换器的双编码器和交叉编码器,有效地根据给定的查询对文档进行排序。我们展示了我们在多语言信息访问(MLIA)共享任务中参与 COVID-19 多语言语义搜索的实验结果。独立评估的 MLIA 结果验证了我们的方法,并表明它在单语和双语运行的情况下,根据几乎所有评估指标,都优于其他最先进的方法。