National Institute of Informatics, Tokyo, Japan.
Asian Institute of Technology, School of Engineering and Technology, Bangkok, Thailand.
BMC Infect Dis. 2019 Mar 21;19(1):272. doi: 10.1186/s12879-019-3874-x.
The goal of this research is to create a system that can use the available relevant information about the factors responsible for the spread of dengue and; use it to predict the occurrence of dengue within a geographical region, so that public health experts can prepare for, manage and control the epidemic. Our study presents new geospatial insights into our understanding and management of health, disease and health-care systems.
We present a machine learning-based methodology capable of providing forecast estimates of dengue prediction in each of the fifty districts of Thailand by leveraging data from multiple data sources. Using a set of prediction variables, we show an increase in prediction accuracy of the model with an optimal combination of predictors which include: meteorological data, clinical data, lag variables of disease surveillance, socioeconomic data and the data encoding spatial dependence on dengue transmission. We use Generalized Additive Models (GAMs) to fit the relationships between the predictors (with a lag of one month) and the clinical data of Dengue hemorrhagic fever (DHF) using the data from 2008 to 2012. Using the data from 2013 to 2015 and a comparative set of prediction models, we evaluate the predictive ability of the fitted models according to RMSE and SRMSE as well as using adjusted R-squared value, deviance explained and change in AIC.
The model allows for combining different predictors to make forecasts with a lead time of one month and also describe the statistical significance of the variables used to characterize the forecast. The discriminating ability of the final model was evaluated against Bangkok specific constant threshold and WHO moving threshold of the epidemic in terms of specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV).
The out-of-sample validation showed poorer results than the in-sample validation, however it demonstrated ability in detecting outbreaks up-to one month ahead. We also determine that for the predicting dengue outbreaks within a district, the influence of dengue incidences and socioeconomic data from the surrounding districts is statistically significant. This validates the influence of movement patterns of people and spatial heterogeneity of human activities on the spread of the epidemic.
本研究的目的是建立一个系统,利用有关导致登革热传播的因素的现有相关信息,并将其用于预测地理区域内登革热的发生,以便公共卫生专家能够做好准备、管理和控制疫情。我们的研究为我们理解和管理健康、疾病和医疗保健系统提供了新的地理空间见解。
我们提出了一种基于机器学习的方法,能够利用来自多个数据源的数据,为泰国的 50 个区中的每一个提供登革热预测的预测估计。使用一组预测变量,我们展示了模型的预测准确性随着预测因子的最佳组合而提高,这些预测因子包括:气象数据、临床数据、疾病监测的滞后变量、社会经济数据以及对登革热传播的空间相关性进行编码的数据。我们使用广义加性模型(GAMs)来拟合预测因子(滞后一个月)与登革出血热(DHF)临床数据之间的关系,使用 2008 年至 2012 年的数据。使用 2013 年至 2015 年的数据和一组比较预测模型,我们根据 RMSE 和 SRMSE 以及调整后的 R 平方值、偏差解释和 AIC 的变化来评估拟合模型的预测能力。
该模型允许结合不同的预测因子来进行一个月提前的预测,并描述用于描述预测的变量的统计显著性。根据特异性、敏感性、阳性预测值(PPV)和阴性预测值(NPV),使用曼谷特定常数阈值和世卫组织疫情移动阈值来评估最终模型的区分能力。
样本外验证的结果不如样本内验证好,但它能够提前一个月检测到疫情爆发。我们还确定,在预测一个区的登革热疫情时,来自周围区的登革热发病率和社会经济数据的影响在统计学上是显著的。这验证了人群的流动模式和人类活动的空间异质性对疫情传播的影响。