Xu Qinneng, Gel Yulia R, Ramirez Ramirez L Leticia, Nezafati Kusha, Zhang Qingpeng, Tsui Kwok-Leung
City University of Hong Kong, Hong Kong SAR, China.
University of Texas at Dallas, Dallas, United States of America.
PLoS One. 2017 May 2;12(5):e0176690. doi: 10.1371/journal.pone.0176690. eCollection 2017.
The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources.
Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates.
DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks.
The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The proposed methodology is easily tractable and computationally efficient.
本研究的目的是调查在线社交媒体和网络搜索查询,特别是谷歌搜索数据,对香港普通门诊诊所(GOPC)流感样疾病(ILI)新病例的预测效用。为了减轻对自我兴奋(即多变的媒体兴趣)的敏感性以及在线社交媒体数据的其他人为因素的影响,我们在方法中融合了多个离线和在线数据源。
采用四个单独的模型:广义线性模型(GLM)、最小绝对收缩和选择算子(LASSO)、自回归积分移动平均(ARIMA)以及带有前馈神经网络(FNN)的深度学习(DL),提前一周和两周预测ILI-GOPC。协变量包括谷歌搜索查询、气象数据以及先前记录的离线ILI。据我们所知,这是第一项将深度学习方法引入传染病监测并研究其预测效用的研究。此外,为了利用每个单独预测模型的优势,我们使用统计模型融合,采用贝叶斯模型平均(BMA),它允许对多个预测情景进行系统整合。对于每个模型,采用自适应方法来捕捉ILI与协变量之间的近期关系。
在四个考虑的单独模型中,带有FNN的DL似乎具有最具竞争力的预测性能。在一个全面的BMA框架中结合所有四个模型,可以进一步改善诸如均方根误差(RMSE)和平均绝对预测误差(MAPE)等预测评估指标。然而,带有FNN的DL仍然是预测流感高峰位置的首选方法。
所提出的方法可以被视为在香港或其他ILI没有恒定季节性趋势且流感数据资源有限的国家预测ILI的一种可行替代方法。所提出的方法易于处理且计算效率高。