Aiken Emily L, Nguyen Andre T, Viboud Cecile, Santillana Mauricio
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
Booz Allen Hamilton, Columbia, MD 21044, USA.
Sci Adv. 2021 Jun 16;7(25). doi: 10.1126/sciadv.abb1237. Print 2021 Jun.
Mitigating the effects of disease outbreaks with timely and effective interventions requires accurate real-time surveillance and forecasting of disease activity, but traditional health care-based surveillance systems are limited by inherent reporting delays. Machine learning methods have the potential to fill this temporal "data gap," but work to date in this area has focused on relatively simple methods and coarse geographic resolutions (state level and above). We evaluate the predictive performance of a gated recurrent unit neural network approach in comparison with baseline machine learning methods for estimating influenza activity in the United States at the state and city levels and experiment with the inclusion of real-time Internet search data. We find that the neural network approach improves upon baseline models for long time horizons of prediction but is not improved by real-time internet search data. We conduct a thorough analysis of feature importances in all considered models for interpretability purposes.
通过及时有效的干预措施减轻疾病爆发的影响,需要对疾病活动进行准确的实时监测和预测,但传统的基于医疗保健的监测系统受到固有报告延迟的限制。机器学习方法有潜力填补这一暂时的“数据空白”,但迄今为止该领域的工作主要集中在相对简单的方法和粗略的地理分辨率(州级及以上)上。我们评估了门控循环单元神经网络方法与基线机器学习方法相比,在估计美国州和城市层面流感活动方面的预测性能,并试验了纳入实时互联网搜索数据的情况。我们发现,神经网络方法在长期预测方面比基线模型有所改进,但实时互联网搜索数据并未使其得到改善。为了便于解释,我们对所有考虑的模型中的特征重要性进行了全面分析。