• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度学习的语义搜索、问答和摘要生成技术进行的COVID-19信息检索

COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization.

作者信息

Esteva Andre, Kale Anuprit, Paulus Romain, Hashimoto Kazuma, Yin Wenpeng, Radev Dragomir, Socher Richard

机构信息

Salesforce Research, Palo Alto, CA, USA.

Yale University, New Haven, CT, USA.

出版信息

NPJ Digit Med. 2021 Apr 12;4(1):68. doi: 10.1038/s41746-021-00437-0.

DOI:10.1038/s41746-021-00437-0
PMID:33846532
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8041998/
Abstract

The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset. Here, we present CO-Search, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. CO-Search is built from two sequential parts: a hybrid semantic-keyword retriever, which takes an input query and returns a sorted list of the 1000 most relevant documents, and a re-ranker, which further orders them by relevance. The retriever is composed of a deep learning model (Siamese-BERT) that encodes query-level meaning, along with two keyword-based models (BM25, TF-IDF) that emphasize the most important words of a query. The re-ranker assigns a relevance score to each document, computed from the outputs of (1) a question-answering module which gauges how much each document answers the query, and (2) an abstractive summarization module which determines how well a query matches a generated summary of the document. To account for the relatively limited dataset, we develop a text augmentation technique which splits the documents into pairs of paragraphs and the citations contained in them, creating millions of (citation title, paragraph) tuples for training the retriever. We evaluate our system ( http://einstein.ai/covid ) on the data of the TREC-COVID information retrieval challenge, obtaining strong performance across multiple key information retrieval metrics.

摘要

新冠疫情全球大流行促使国际社会努力了解、追踪和缓解该疾病,从而产生了大量跨学科的与新冠病毒和严重急性呼吸综合征冠状病毒2(SARS-CoV-2)相关的出版物。在2020年全年,通过新冠病毒开放研究数据集收集了超过40万篇与冠状病毒相关的出版物。在此,我们展示了CO-Search,这是一个语义化、多阶段的搜索引擎,旨在处理关于新冠病毒文献的复杂查询,有可能帮助不堪重负的医护人员在危机时刻找到科学答案并避免错误信息。CO-Search由两个连续部分构建而成:一个混合语义-关键词检索器,它接受输入查询并返回1000篇最相关文档的排序列表,以及一个重新排序器,它进一步按相关性对这些文档进行排序。检索器由一个对查询级含义进行编码的深度学习模型(连体BERT)以及两个强调查询最重要单词的基于关键词的模型(BM25、TF-IDF)组成。重新排序器为每个文档分配一个相关性分数,该分数由以下两个部分的输出计算得出:(1)一个问答模块,用于衡量每个文档对查询的回答程度;(2)一个抽象摘要模块,用于确定查询与文档生成的摘要的匹配程度。为了应对相对有限的数据集,我们开发了一种文本增强技术,该技术将文档拆分为段落对及其包含的引用,创建数百万个(引用标题,段落)元组用于训练检索器。我们在TREC-COVID信息检索挑战赛的数据上评估我们的系统(http://einstein.ai/covid),在多个关键信息检索指标上取得了优异的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3604/8041998/c4a95825771a/41746_2021_437_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3604/8041998/56fe10478edb/41746_2021_437_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3604/8041998/c4a95825771a/41746_2021_437_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3604/8041998/56fe10478edb/41746_2021_437_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3604/8041998/c4a95825771a/41746_2021_437_Fig2_HTML.jpg

相似文献

1
COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization.基于深度学习的语义搜索、问答和摘要生成技术进行的COVID-19信息检索
NPJ Digit Med. 2021 Apr 12;4(1):68. doi: 10.1038/s41746-021-00437-0.
2
Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.使用图查询搜索 COVID-19 临床研究:算法开发与验证。
J Med Internet Res. 2024 May 30;26:e52655. doi: 10.2196/52655.
3
A COVID-19 Search Engine (CO-SE) with Transformer-based architecture.一种基于Transformer架构的新冠病毒搜索引擎(CO-SE)。
Healthc Anal (N Y). 2022 Nov;2:100068. doi: 10.1016/j.health.2022.100068. Epub 2022 Jun 6.
4
How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information.ChatGPT 与谷歌相比如何使用来源信息?在线健康信息的文本网络分析。
Clin Orthop Relat Res. 2024 Apr 1;482(4):578-588. doi: 10.1097/CORR.0000000000002995. Epub 2024 Mar 1.
5
Revealing Opinions for COVID-19 Questions Using a Context Retriever, Opinion Aggregator, and Question-Answering Model: Model Development Study.使用上下文检索器、观点聚合器和问答模型揭示对 COVID-19 问题的看法:模型开发研究。
J Med Internet Res. 2021 Mar 19;23(3):e22860. doi: 10.2196/22860.
6
COBERT: COVID-19 Question Answering System Using BERT.COBERT:使用BERT的COVID-19问答系统。
Arab J Sci Eng. 2021 Jun 23:1-11. doi: 10.1007/s13369-021-05810-5.
7
SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.SemBioNLQA:一个语义生物医学问答系统,用于检索自然语言问题的准确和理想答案。
Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.
8
Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization.基于神经匹配和分面摘要的精准医学文献检索
Proc Conf Empir Methods Nat Lang Process. 2020 Nov;2020:3389-3399. doi: 10.18653/v1/2020.findings-emnlp.304.
9
Information Retrieval in an Infodemic: The Case of COVID-19 Publications.信息疫情中的信息检索:以新冠疫情相关出版物为例
J Med Internet Res. 2021 Sep 17;23(9):e30161. doi: 10.2196/30161.
10
Learning to rank query expansion terms for COVID-19 scholarly search.学习对 COVID-19 学术搜索进行查询扩展词的排序。
J Biomed Inform. 2023 Jun;142:104386. doi: 10.1016/j.jbi.2023.104386. Epub 2023 May 12.

引用本文的文献

1
AI edge cloud service provisioning for knowledge management smart applications.用于知识管理智能应用的人工智能边缘云服务供应
Sci Rep. 2025 Sep 1;15(1):32246. doi: 10.1038/s41598-025-14429-7.
2
Enhancing biomedical named entity recognition with parallel boundary detection and category classification.通过并行边界检测和类别分类增强生物医学命名实体识别
BMC Bioinformatics. 2025 Feb 25;26(1):63. doi: 10.1186/s12859-025-06086-4.
3
Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation.

本文引用的文献

1
TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19.TREC-COVID:针对 COVID-19 的信息检索共享任务的原理和结构。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1431-1436. doi: 10.1093/jamia/ocaa091.
2
Research and Development on Therapeutic Agents and Vaccines for COVID-19 and Related Human Coronavirus Diseases.新型冠状病毒肺炎及相关人类冠状病毒疾病治疗药物和疫苗的研发
ACS Cent Sci. 2020 Mar 25;6(3):315-331. doi: 10.1021/acscentsci.0c00272. Epub 2020 Mar 12.
针对新冠肺炎与卫生新闻信息提取的卫生人力资源:算法开发与验证
JMIR AI. 2024 Oct 30;3:e55059. doi: 10.2196/55059.
4
A COVID-19 Search Engine (CO-SE) with Transformer-based architecture.一种基于Transformer架构的新冠病毒搜索引擎(CO-SE)。
Healthc Anal (N Y). 2022 Nov;2:100068. doi: 10.1016/j.health.2022.100068. Epub 2022 Jun 6.
5
Semantic matching based legal information retrieval system for COVID-19 pandemic.基于语义匹配的新冠疫情法律信息检索系统
Artif Intell Law (Dordr). 2023 Mar 14:1-30. doi: 10.1007/s10506-023-09354-x.
6
Leveraging physiology and artificial intelligence to deliver advancements in health care.利用生理学和人工智能在医疗保健领域取得进步。
Physiol Rev. 2023 Oct 1;103(4):2423-2450. doi: 10.1152/physrev.00033.2022. Epub 2023 Apr 27.
7
The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges.新型冠状病毒肺炎数据分析的可重复性:悖论、陷阱及未来挑战
PNAS Nexus. 2022 Aug 23;1(3):pgac125. doi: 10.1093/pnasnexus/pgac125. eCollection 2022 Jul.
8
Complex Knowledge Base Question Answering for Intelligent Bridge Management Based on Multi-Task Learning and Cross-Task Constraints.基于多任务学习和跨任务约束的智能桥梁管理复杂知识库问答
Entropy (Basel). 2022 Dec 10;24(12):1805. doi: 10.3390/e24121805.
9
LitCovid ensemble learning for COVID-19 multi-label classification.LitCovid 用于 COVID-19 多标签分类的集成学习。
Database (Oxford). 2022 Nov 25;2022. doi: 10.1093/database/baac103.
10
Exploration of biomedical knowledge for recurrent glioblastoma using natural language processing deep learning models.利用自然语言处理深度学习模型探索复发性脑胶质瘤的生物医学知识。
BMC Med Inform Decis Mak. 2022 Oct 13;22(1):267. doi: 10.1186/s12911-022-02003-4.