Suppr超能文献

CoQUAD:一个 COVID-19 问答数据集系统,促进研究、基准测试和实践。

CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice.

机构信息

Public Health Ontario (PHO), Toronto, ON, Canada.

Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

出版信息

BMC Bioinformatics. 2022 Jun 2;23(1):210. doi: 10.1186/s12859-022-04751-6.

Abstract

BACKGROUND

Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time.

METHODS

This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics.

RESULTS

Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%.

CONCLUSION

CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions.

摘要

背景

由于 COVID-19 研究文献的数量不断增加,医学专家、临床科学家和研究人员经常难以及时了解最新发现。迫切需要帮助研究人员和从业者及时挖掘和响应与 COVID-19 相关的问题。

方法

本文介绍了 CoQUAD,这是一种问答系统,能够高效地提取与 COVID-19 问题相关的答案。这项工作提供了两个数据集:一个是使用 CORD-19 和 LitCOVID 计划构建的参考标准数据集,另一个是由公共卫生领域的专家准备的黄金标准数据集。CoQUAD 具有基于 BM25 算法的检索组件,该组件可根据与 COVID-19 相关的问题在参考标准数据集中搜索相关文档。CoQUAD 还有一个 Reader 组件,它由一个基于 Transformer 的模型 MPNet 组成,用于阅读段落并从检索到的文档中找到与问题相关的答案。与之前的工作相比,所提出的 CoQUAD 系统可以回答与 COVID-19 早期、中期和后期相关的问题。

结果

对 CoQUAD 检索器和读者模块进行了广泛的实验,结果表明 CoQUAD 可以用自然语言有效地提供与任何 COVID-19 相关问题相关的答案,具有更高的准确性。与最先进的基准相比,CoQUAD 优于之前的模型,精确匹配率得分为 77.50%,F1 得分为 77.10%。

结论

CoQUAD 是一种问答系统,它使用自然语言处理技术挖掘 COVID-19 文献,帮助研究社区找到最新发现并回答任何相关问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d34/9161540/4e22186c7f5c/12859_2022_4751_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验