McGough Sarah F, Brownstein John S, Hawkins Jared B, Santillana Mauricio
Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.
Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, United States of America.
PLoS Negl Trop Dis. 2017 Jan 13;11(1):e0005295. doi: 10.1371/journal.pntd.0005295. eCollection 2017 Jan.
Over 400,000 people across the Americas are thought to have been infected with Zika virus as a consequence of the 2015-2016 Latin American outbreak. Official government-led case count data in Latin America are typically delayed by several weeks, making it difficult to track the disease in a timely manner. Thus, timely disease tracking systems are needed to design and assess interventions to mitigate disease transmission.
METHODOLOGY/PRINCIPAL FINDINGS: We combined information from Zika-related Google searches, Twitter microblogs, and the HealthMap digital surveillance system with historical Zika suspected case counts to track and predict estimates of suspected weekly Zika cases during the 2015-2016 Latin American outbreak, up to three weeks ahead of the publication of official case data. We evaluated the predictive power of these data and used a dynamic multivariable approach to retrospectively produce predictions of weekly suspected cases for five countries: Colombia, El Salvador, Honduras, Venezuela, and Martinique. Models that combined Google (and Twitter data where available) with autoregressive information showed the best out-of-sample predictive accuracy for 1-week ahead predictions, whereas models that used only Google and Twitter typically performed best for 2- and 3-week ahead predictions.
Given the significant delay in the release of official government-reported Zika case counts, we show that these Internet-based data streams can be used as timely and complementary ways to assess the dynamics of the outbreak.
据认为,在2015 - 2016年拉丁美洲疫情爆发期间,美洲各地超过40万人感染了寨卡病毒。拉丁美洲政府主导的官方病例计数数据通常会延迟数周,这使得及时追踪该疾病变得困难。因此,需要及时的疾病追踪系统来设计和评估减轻疾病传播的干预措施。
方法/主要发现:我们将与寨卡相关的谷歌搜索信息、推特微博以及HealthMap数字监测系统的信息与历史寨卡疑似病例计数相结合,以追踪和预测2015 - 2016年拉丁美洲疫情爆发期间每周的寨卡疑似病例估计数,比官方病例数据发布提前多达三周。我们评估了这些数据的预测能力,并采用动态多变量方法回顾性地对五个国家(哥伦比亚、萨尔瓦多、洪都拉斯、委内瑞拉和马提尼克)的每周疑似病例进行预测。将谷歌(以及可用时的推特数据)与自回归信息相结合的模型在提前1周预测时显示出最佳的样本外预测准确性,而仅使用谷歌和推特的模型通常在提前2周和3周预测时表现最佳。
鉴于政府报告的寨卡病例计数发布存在显著延迟,我们表明这些基于互联网的数据流可作为及时且互补的方式来评估疫情动态。