Alzubi Jafar A, Jain Rachna, Singh Anubhav, Parwekar Pritee, Gupta Meenu
Al-Balqa Applied University, Salt, Jordan.
Bharati Vidyapeeth's College of Engineering, New Delhi, India.
Arab J Sci Eng. 2021 Jun 23:1-11. doi: 10.1007/s13369-021-05810-5.
In the current situation of worldwide pandemic COVID-19, which has infected 62.5 Million people and caused nearly 1.46 Million deaths worldwide as of Nov 2020. The profoundly powerful and quickly advancing circumstance with COVID-19 has made it hard to get precise, on-request latest data with respect to the virus. Especially, the frontline workers of the battle medical services experts, policymakers, clinical scientists, and so on will require expert specific methods to stay aware of this literature for getting scientific knowledge of the latest research findings. The risks are most certainly not trivial, as decisions made on fallacious, answers may endanger trust or general well being and security of the public. But, with thousands of research papers being dispensed on the topic, making it more difficult to keep track of the latest research. Taking these challenges into account we have proposed COBERT: a retriever-reader dual algorithmic system that answers the complex queries by searching a document of 59K corona virus-related literature made accessible through the Coronavirus Open Research Dataset Challenge (CORD-19). The retriever is composed of a TF-IDF vectorizer capturing the top 500 documents with optimal scores. The reader which is pre-trained Bidirectional Encoder Representations from Transformers (BERT) on SQuAD 1.1 dev dataset built on top of the HuggingFace BERT transformers, refines the sentences from the filtered documents, which are then passed into ranker which compares the logits scores to produce a short answer, title of the paper and source article of extraction. The proposed DistilBERT version has outperformed previous pre-trained models obtaining an Exact Match(EM)/F1 score of 80.6/87.3 respectively.
截至2020年11月,全球处于新冠疫情大流行的形势下,已有6250万人感染,近146万人死亡。新冠疫情形势严峻且发展迅速,难以获取关于该病毒准确、及时的最新数据。尤其是奋战在一线的医疗服务专家、政策制定者、临床科学家等,需要专业的特定方法来跟进这些文献,以了解最新研究成果的科学知识。这些风险绝非微不足道,基于错误信息做出的决策可能会危及公众的信任或总体健康与安全。但是,关于这个主题有数千篇研究论文发表,要跟踪最新研究变得更加困难。考虑到这些挑战,我们提出了COBERT:一种检索-阅读双算法系统,通过搜索由冠状病毒开放研究数据集挑战赛(CORD-19)提供的59K篇与冠状病毒相关文献的文档来回答复杂查询。检索器由一个TF-IDF向量器组成,它捕获得分最优的前500篇文档。阅读器是在HuggingFace BERT变换器之上基于SQuAD 1.1开发数据集预训练的双向编码器表征来自变换器(BERT),它对从过滤后的文档中提取的句子进行优化,然后将这些句子传入排序器,排序器比较逻辑得分以生成简短答案、论文标题和提取的源文章。所提出的DistilBERT版本优于先前的预训练模型,分别获得了80.6/87.3的精确匹配(EM)/F1分数。