Suppr超能文献

一个利用综合公共数据追踪新冠病毒传播的信息流行病学框架。

An infodemiological framework for tracking the spread of SARS-CoV-2 using integrated public data.

作者信息

Liu Zhimin, Jiang Zuodong, Kip Geoffrey, Snigdha Kirti, Xu Jennings, Wu Xiaoying, Khan Najat, Schultz Timothy

机构信息

Janssen R&D Data Science, Janssen Research and Development, 2341 S Whittmore St, Titusville 08560, Furlong, PA 18925, United States.

出版信息

Pattern Recognit Lett. 2022 Jun;158:133-140. doi: 10.1016/j.patrec.2022.04.030. Epub 2022 Apr 26.

Abstract

The outbreak of the SARS-CoV-2 novel coronavirus has caused a health crisis of immeasurable magnitude. Signals from heterogeneous public data sources could serve as early predictors for infection waves of the pandemic, particularly in its early phases, when infection data was scarce. In this article, we characterize temporal pandemic indicators by leveraging an integrated set of public data and apply them to a Prophet model to predict COVID-19 trends. An effective natural language processing pipeline was first built to extract time-series signals of specific articles from a news corpus. Bursts of these temporal signals were further identified with Kleinberg's burst detection algorithm. Across different US states, correlations for Google Trends of COVID-19 related terms, COVID-19 news volume, and publicly available wastewater SARS-CoV-2 measurements with weekly COVID-19 case numbers were generally high with lags ranging from 0 to 3 weeks, indicating them as strong predictors of viral spread. Incorporating time-series signals of these effective predictors significantly improved the performance of the Prophet model, which was able to predict the COVID-19 case numbers between one and two weeks with average mean absolute error rates of 0.38 and 0.46 respectively across different states.

摘要

严重急性呼吸综合征冠状病毒2(SARS-CoV-2)新型冠状病毒的爆发引发了一场规模难以估量的健康危机。来自异类公共数据源的信号可以作为大流行感染浪潮的早期预测指标,尤其是在大流行早期阶段,此时感染数据稀缺。在本文中,我们通过利用一组综合公共数据来表征大流行的时间指标,并将其应用于Prophet模型以预测新冠疫情趋势。首先构建了一个有效的自然语言处理管道,从新闻语料库中提取特定文章的时间序列信号。利用克莱因伯格的突发检测算法进一步识别这些时间信号的突发情况。在美国不同州,与新冠疫情相关术语的谷歌趋势、新冠疫情新闻量以及公开可用的废水中SARS-CoV-2测量值与每周新冠疫情病例数之间的相关性通常较高,滞后时间为0至3周,表明它们是病毒传播的有力预测指标。纳入这些有效预测指标的时间序列信号显著提高了Prophet模型的性能,该模型能够在一到两周内预测新冠疫情病例数,不同州的平均平均绝对误差率分别为0.38和0.46。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f88/9040481/f034e8c88997/gr1_lrg.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验