Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, United States of America.
Department of Ecology and Evolutionary Biology, Princeton University, Princeton, United States of America.
PLoS Comput Biol. 2022 Sep 8;18(9):e1010251. doi: 10.1371/journal.pcbi.1010251. eCollection 2022 Sep.
Measles is one the best-documented and most-mechanistically-studied non-linear infectious disease dynamical systems. However, systematic investigation into the comparative performance of traditional mechanistic models and machine learning approaches in forecasting the transmission dynamics of this pathogen are still rare. Here, we compare one of the most widely used semi-mechanistic models for measles (TSIR) with a commonly used machine learning approach (LASSO), comparing performance and limits in predicting short to long term outbreak trajectories and seasonality for both regular and less regular measles outbreaks in England and Wales (E&W) and the United States. First, our results indicate that the proposed LASSO model can efficiently use data from multiple major cities and achieve similar short-to-medium term forecasting performance to semi-mechanistic models for E&W epidemics. Second, interestingly, the LASSO model also captures annual to biennial bifurcation of measles epidemics in E&W caused by susceptible response to the late 1940s baby boom. LASSO may also outperform TSIR for predicting less-regular dynamics such as those observed in major cities in US between 1932-45. Although both approaches capture short-term forecasts, accuracy suffers for both methods as we attempt longer-term predictions in highly irregular, post-vaccination outbreaks in E&W. Finally, we illustrate that the LASSO model can both qualitatively and quantitatively reconstruct mechanistic assumptions, notably susceptible dynamics, in the TSIR model. Our results characterize the limits of predictability of infectious disease dynamics for strongly immunizing pathogens with both mechanistic and machine learning models, and identify connections between these two approaches.
麻疹是一种被记录最多、机制研究最深入的非线性传染病动力学系统。然而,系统地研究传统机械模型和机器学习方法在预测这种病原体传播动力学方面的比较性能仍然很少。在这里,我们将麻疹最广泛使用的半机械模型之一(TSIR)与一种常用的机器学习方法(LASSO)进行了比较,比较了在预测英格兰和威尔士(E&W)以及美国常规和不规律麻疹暴发的短期到长期暴发轨迹和季节性方面的性能和局限性。首先,我们的结果表明,所提出的 LASSO 模型可以有效地利用来自多个主要城市的数据,并实现与 E&W 流行地区半机械模型类似的短期到中期预测性能。其次,有趣的是,LASSO 模型还可以捕捉到 E&W 麻疹流行的年度到两年的分岔,这是由于易感人群对 20 世纪 40 年代末婴儿潮的反应引起的。LASSO 模型也可能在预测 E&W 疫苗接种后不规律的动态方面优于 TSIR,例如 1932-45 年美国主要城市观察到的动态。虽然这两种方法都可以捕捉短期预测,但由于我们试图对 E&W 中高度不规则的、疫苗接种后的暴发进行长期预测,因此准确性都会受到影响。最后,我们说明了 LASSO 模型可以定性和定量地重建 TSIR 模型中的易感动力学等机械假设。我们的结果描述了具有机械和机器学习模型的强免疫病原体传染病动力学的可预测性极限,并确定了这两种方法之间的联系。