H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, 755 Ferst Dr NW, Atlanta, GA, 30332-0205, USA.
Sci Rep. 2022 Jul 7;12(1):11539. doi: 10.1038/s41598-022-15478-y.
As the COVID-19 ravaging through the globe, accurate forecasts of the disease spread are crucial for situational awareness, resource allocation, and public health decision-making. Alternative to the traditional disease surveillance data collected by the United States (US) Centers for Disease Control and Prevention (CDC), big data from Internet such as online search volumes also contain valuable information for tracking infectious disease dynamics such as influenza epidemic. In this study, we develop a statistical model using Internet search volume of relevant queries to track and predict COVID-19 pandemic in the United States. Inspired by the strong association between COVID-19 death trend and symptom-related search queries such as "loss of taste", we combine search volume information with COVID-19 time series information for US national level forecasts, while leveraging the cross-state cross-resolution spatial temporal framework, pooling information from search volume and COVID-19 reports across regions for state level predictions. Lastly, we aggregate the state-level frameworks in an ensemble fashion to produce the final state-level 4-week forecasts. Our method outperforms the baseline time-series model, while performing reasonably against other publicly available benchmark models for both national and state level forecast.
随着 COVID-19 在全球肆虐,准确预测疾病的传播对于了解情况、分配资源和做出公共卫生决策至关重要。除了美国疾病控制与预防中心(CDC)收集的传统疾病监测数据外,互联网上的大数据,如在线搜索量,也包含了追踪传染病动态(如流感流行)的有价值信息。在这项研究中,我们使用与相关查询相关的互联网搜索量开发了一个统计模型,以跟踪和预测美国的 COVID-19 大流行。受 COVID-19 死亡趋势与“味觉丧失”等症状相关搜索查询之间强关联的启发,我们将搜索量信息与 COVID-19 时间序列信息结合起来,对美国国家级预测进行建模,同时利用跨州跨分辨率时空框架,从不同地区的搜索量和 COVID-19 报告中汇集信息,进行州级预测。最后,我们以集合的方式汇总州级框架,以生成最终的州级 4 周预测。我们的方法优于基线时间序列模型,并且在国家级和州级预测方面,与其他公开可用的基准模型相比表现相当。