Program in Public Health, College of Health Sciences, Uniersity of California, Irvine, California, USA.
Department of Computer Science, University of California, Irvine, California, USA.
BMC Public Health. 2019 Jun 14;19(1):761. doi: 10.1186/s12889-019-7103-8.
Zika virus (ZIKV) is an emerging mosquito-borne arbovirus that can produce serious public health consequences. In 2016, ZIKV caused an epidemic in many countries around the world, including the United States. ZIKV surveillance and vector control is essential to combating future epidemics. However, challenges relating to the timely publication of case reports significantly limit the effectiveness of current surveillance methods. In many countries with poor infrastructure, established systems for case reporting often do not exist. Previous studies investigating the H1N1 pandemic, general influenza and the recent Ebola outbreak have demonstrated that time- and geo-tagged Twitter data, which is immediately available, can be utilized to overcome these limitations.
In this study, we employed a recently developed system called Cloudberry to filter a random sample of Twitter data to investigate the feasibility of using such data for ZIKV epidemic tracking on a national and state (Florida) level. Two auto-regressive models were calibrated using weekly ZIKV case counts and zika tweets in order to estimate weekly ZIKV cases 1 week in advance.
While models tended to over-predict at low case counts and under-predict at extreme high counts, a comparison of predicted versus observed weekly ZIKV case counts following model calibration demonstrated overall reasonable predictive accuracy, with an R of 0.74 for the Florida model and 0.70 for the U.S.
Time-series analysis of predicted and observed ZIKV cases following internal cross-validation exhibited very similar patterns, demonstrating reasonable model performance. Spatially, the distribution of cumulative ZIKV case counts (local- & travel-related) and zika tweets across all 50 U.S. states showed a high correlation (r = 0.73) after adjusting for population.
This study demonstrates the value of utilizing Twitter data for the purposes of disease surveillance. This is of high value to epidemiologist and public health officials charged with protecting the public during future outbreaks.
寨卡病毒(ZIKV)是一种新出现的蚊媒病毒,可造成严重的公共卫生后果。2016 年,ZIKV 在世界许多国家引发了疫情,包括美国。寨卡病毒监测和病媒控制对于应对未来的疫情至关重要。然而,与及时发布病例报告相关的挑战极大地限制了当前监测方法的有效性。在基础设施较差的许多国家,通常不存在建立的病例报告系统。先前针对 H1N1 大流行、普通流感和最近的埃博拉疫情的研究表明,可利用时间和地理标记的 Twitter 数据来克服这些限制,这些数据是立即可用的。
在本研究中,我们使用了一种名为 Cloudberry 的新系统,对随机抽取的 Twitter 数据进行过滤,以调查在全国和州(佛罗里达州)一级使用此类数据进行寨卡病毒流行跟踪的可行性。使用每周寨卡病毒病例数和寨卡病毒推文对数个自回归模型进行了校准,以便提前一周估计每周寨卡病毒病例数。
虽然模型往往在低病例数时过高预测,在极高病例数时过低预测,但在模型校准后,对预测与观察到的每周寨卡病毒病例数进行比较,结果表明整体预测准确率较高,佛罗里达州模型的 R 值为 0.74,美国模型的 R 值为 0.70。
内部交叉验证后对预测和观察到的寨卡病毒病例进行时间序列分析,结果显示出相似的模式,表明模型性能良好。从空间上看,在对人口进行调整后,全美 50 个州的累积寨卡病毒病例数(本地和旅行相关)和寨卡病毒推文的分布高度相关(r=0.73)。
本研究证明了利用 Twitter 数据进行疾病监测的价值。这对于负责在未来疫情期间保护公众的流行病学家和公共卫生官员具有很高的价值。