Am J Epidemiol. 2022 Sep 28;191(10):1803-1812. doi: 10.1093/aje/kwac090.
Dengue is a serious public health concern in Brazil and globally. In the absence of a universal vaccine or specific treatments, prevention relies on vector control and disease surveillance. Accurate and early forecasts can help reduce the spread of the disease. In this study, we developed a model for predicting monthly dengue cases in Brazilian cities 1 month ahead, using data from 2007-2019. We compared different machine learning algorithms and feature selection methods using epidemiologic and meteorological variables. We found that different models worked best in different cities, and a random forests model trained on monthly dengue cases performed best overall. It produced lower errors than a seasonal naive baseline model, gradient boosting regression, a feed-forward neural network, or support vector regression. For each city, we computed the mean absolute error between predictions and true monthly numbers of dengue cases on the test data set. The median error across all cities was 12.2 cases. This error was reduced to 11.9 when selecting the optimal combination of algorithm and input features for each city individually. Machine learning and especially decision tree ensemble models may contribute to dengue surveillance in Brazil, as they produce low out-of-sample prediction errors for a geographically diverse set of cities.
登革热是巴西和全球范围内严重的公共卫生问题。在缺乏通用疫苗或特定治疗方法的情况下,预防依赖于病媒控制和疾病监测。准确和早期的预测可以帮助减少疾病的传播。在这项研究中,我们使用 2007-2019 年的数据,为巴西城市的每月登革热病例建立了一个提前 1 个月预测的模型。我们比较了不同的机器学习算法和特征选择方法,使用了流行病学和气象变量。我们发现不同的模型在不同的城市表现最佳,并且在总体上,基于每月登革热病例训练的随机森林模型表现最佳。它产生的误差低于季节性朴素基线模型、梯度提升回归、前馈神经网络或支持向量回归。对于每个城市,我们计算了预测值与测试数据集上真实每月登革热病例数之间的平均绝对误差。所有城市的中位数误差为 12.2 例。当为每个城市单独选择最优的算法和输入特征组合时,误差减少到 11.9。机器学习,特别是决策树集成模型,可能有助于巴西的登革热监测,因为它们为地理上多样化的城市组产生了较低的样本外预测误差。