X Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM, United States.
School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ, United States.
J Med Internet Res. 2020 Jul 3;22(7):e14337. doi: 10.2196/14337.
Influenza epidemics result in a public health and economic burden worldwide. Traditional surveillance techniques, which rely on doctor visits, provide data with a delay of 1 to 2 weeks. A means of obtaining real-time data and forecasting future outbreaks is desirable to provide more timely responses to influenza epidemics.
This study aimed to present the first implementation of a novel dataset by demonstrating its ability to supplement traditional disease surveillance at multiple spatial resolutions.
We used internet traffic data from the Centers for Disease Control and Prevention (CDC) website to determine the potential usability of this data source. We tested the traffic generated by 10 influenza-related pages in 8 states and 9 census divisions within the United States and compared it against clinical surveillance data.
Our results yielded an r value of 0.955 in the most successful case, promising results for some cases, and unsuccessful results for other cases. In the interest of scientific transparency to further the understanding of when internet data streams are an appropriate supplemental data source, we also included negative results (ie, unsuccessful models). Models that focused on a single influenza season were more successful than those that attempted to model multiple influenza seasons. Geographic resolution appeared to play a key role, with national and regional models being more successful, overall, than models at the state level.
These results demonstrate that internet data may be able to complement traditional influenza surveillance in some cases but not in others. Specifically, our results show that the CDC website traffic may inform national- and division-level models but not models for each individual state. In addition, our results show better agreement when the data were broken up by seasons instead of aggregated over several years. We anticipate that this work will lead to more complex nowcasting and forecasting models using this data stream.
流感疫情在全球范围内造成了公共卫生和经济负担。传统的监测技术依赖于医生就诊,数据延迟 1 至 2 周。获得实时数据并预测未来疫情的方法是及时应对流感疫情的理想选择。
本研究旨在通过展示其补充传统疾病监测的能力,介绍一种新型数据集的首次实施,以多个空间分辨率呈现。
我们使用疾病预防控制中心(CDC)网站的互联网流量数据来确定该数据源的潜在可用性。我们测试了美国 8 个州和 9 个普查区的 10 个与流感相关页面产生的流量,并将其与临床监测数据进行了比较。
在最成功的情况下,我们的结果产生了 0.955 的 r 值,对于一些情况有很好的结果,而对于其他情况则没有成功。为了科学透明,进一步了解何时互联网数据流是适当的补充数据源,我们还包括了负面结果(即不成功的模型)。专注于单个流感季节的模型比试图模拟多个流感季节的模型更成功。地理分辨率似乎起着关键作用,总体而言,国家和地区模型比州级模型更成功。
这些结果表明,互联网数据在某些情况下可能能够补充传统的流感监测,但在其他情况下则不行。具体来说,我们的结果表明,CDC 网站的流量可能会为国家和地区模型提供信息,但不能为每个州的模型提供信息。此外,当数据按季节划分而不是多年汇总时,我们的结果显示出更好的一致性。我们预计这项工作将使用此数据流带来更复杂的实时预测和预测模型。