Public Health Ontario (PHO), Toronto, ON, Canada.
Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
BMC Bioinformatics. 2022 Jun 2;23(1):210. doi: 10.1186/s12859-022-04751-6.
Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time.
This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics.
Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%.
CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions.
由于 COVID-19 研究文献的数量不断增加,医学专家、临床科学家和研究人员经常难以及时了解最新发现。迫切需要帮助研究人员和从业者及时挖掘和响应与 COVID-19 相关的问题。
本文介绍了 CoQUAD,这是一种问答系统,能够高效地提取与 COVID-19 问题相关的答案。这项工作提供了两个数据集:一个是使用 CORD-19 和 LitCOVID 计划构建的参考标准数据集,另一个是由公共卫生领域的专家准备的黄金标准数据集。CoQUAD 具有基于 BM25 算法的检索组件,该组件可根据与 COVID-19 相关的问题在参考标准数据集中搜索相关文档。CoQUAD 还有一个 Reader 组件,它由一个基于 Transformer 的模型 MPNet 组成,用于阅读段落并从检索到的文档中找到与问题相关的答案。与之前的工作相比,所提出的 CoQUAD 系统可以回答与 COVID-19 早期、中期和后期相关的问题。
对 CoQUAD 检索器和读者模块进行了广泛的实验,结果表明 CoQUAD 可以用自然语言有效地提供与任何 COVID-19 相关问题相关的答案,具有更高的准确性。与最先进的基准相比,CoQUAD 优于之前的模型,精确匹配率得分为 77.50%,F1 得分为 77.10%。
CoQUAD 是一种问答系统,它使用自然语言处理技术挖掘 COVID-19 文献,帮助研究社区找到最新发现并回答任何相关问题。