Office of Data Science and Emerging Technologies, NIAID, NIH, Rockville, MD, USA.
Both authors contributed to the work equally.
AMIA Annu Symp Proc. 2022 Feb 21;2021:466-475. eCollection 2021.
After the emergence of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) in 2019, identification of immune correlates of protection (CoPs) have become increasingly important to understand the immune response to SARS-CoV-2. The vast amount of preprint and published literature related to COVID-19 makes it challenging for researchers to stay up to date on research results regarding CoPs against SARS-CoV-2. To address this problem, we developed a machine learning classifier to identify papers relevant to CoPs and a customized named entity recognition (NER) model to extract terms of interest, including CoPs, vaccines, assays, and animal models. A user-friendly visualization tool was populated with the extracted and normalized NER results and associated publication information including links to full-text articles and clinical trial information where available. The goal of this pilot project is to provide a basis for developing real-time informatics platforms that can inform researchers with scientific insights from emerging research.
自 2019 年严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)出现以来,鉴定免疫保护相关因素(CoPs)对于了解针对 SARS-CoV-2 的免疫反应变得越来越重要。与 COVID-19 相关的大量预印本和已发表文献使得研究人员难以及时了解针对 SARS-CoV-2 的 CoPs 研究结果。为了解决这个问题,我们开发了一种机器学习分类器来识别与 CoPs 相关的论文,并开发了一个定制的命名实体识别(NER)模型来提取相关术语,包括 CoPs、疫苗、检测方法和动物模型。一个用户友好的可视化工具中填充了提取和规范化的 NER 结果以及相关的出版信息,包括全文文章和临床试验信息(在可用的情况下)的链接。该试点项目的目标是为开发实时信息学平台提供基础,这些平台可以为研究人员提供来自新兴研究的科学见解。