Departments of Computer Science and Mathematics, Birla Institute of Technology and Science, Pilani, India.
Department of Mathematics, Indian Institute of Science, Bangalore, India.
PLoS Comput Biol. 2019 Nov 21;15(11):e1007518. doi: 10.1371/journal.pcbi.1007518. eCollection 2019 Nov.
Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world's population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time.
登革热和流感样疾病(ILI)是世界上两大主要的病毒感染原因,据估计,超过一半的世界人口面临感染这些疾病的风险。因此,开发准确的登革热和 ILI 发病率预测方法非常重要。由于来自多个来源的数据(例如登革热和 ILI 病例数、电子健康记录和 Google Trends 中多个互联网搜索词的频率)可以改善预测,因此,如果我们使用多个来源,标准时间序列分析方法不足以从有限的数据量中估计所有参数值。在本文中,我们使用一种计算效率高的已知变量选择方法的实现,我们称之为自回归似然比(ARLR)方法。该方法结合了时间序列数据、电子健康记录数据(用于 ILI)和 Google Trends 数据的稀疏表示,以预测登革热和 ILI 的发病率。这种稀疏表示方法使用一种算法,该算法在每一步都最大化适当的似然比。通过数值实验,我们证明我们的方法比lasso 方法更准确地恢复了基础稀疏模型。我们将我们的方法应用于来自五个国家/地区的登革热病例数数据:巴西、墨西哥、新加坡、中国台湾和泰国,以及来自美国的 ILI 病例数数据。数值实验表明,我们的方法在预测登革热和 ILI 病例数方面优于现有的时间序列预测方法。特别是,我们的方法比同时使用多个来源数据的领先方法降低了 18%的预测误差。它在预测病例数的峰值和峰值时间方面也优于其他方法。