Department of Economics, University of Waterloo, Waterloo, ON, Canada.
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada.
Front Public Health. 2023 Dec 11;11:1259410. doi: 10.3389/fpubh.2023.1259410. eCollection 2023.
There is a vast literature on the performance of different short-term forecasting models for country specific COVID-19 cases, but much less research with respect to city level cases. This paper employs daily case counts for 25 Metropolitan Statistical Areas (MSAs) in the U.S. to evaluate the efficacy of a variety of statistical forecasting models with respect to 7 and 28-day ahead predictions.
This study employed Gradient Boosted Regression Trees (GBRT), Linear Mixed Effects (LME), Susceptible, Infectious, or Recovered (SIR), and Seasonal Autoregressive Integrated Moving Average (SARIMA) models to generate daily forecasts of COVID-19 cases from November 2020 to March 2021.
Consistent with other research that have employed Machine Learning (ML) based methods, we find that Median Absolute Percentage Error (MAPE) values for both 7-day ahead and 28-day ahead predictions from GBRTs are lower than corresponding values from SIR, Linear Mixed Effects (LME), and Seasonal Autoregressive Integrated Moving Average (SARIMA) specifications for the majority of MSAs during November-December 2020 and January 2021. GBRT and SARIMA models do not offer high-quality predictions for February 2021. However, SARIMA generated MAPE values for 28-day ahead predictions are slightly lower than corresponding GBRT estimates for March 2021.
The results of this research demonstrate that basic ML models can lead to relatively accurate forecasts at the local level, which is important for resource allocation decisions and epidemiological surveillance by policymakers.
关于针对特定国家 COVID-19 病例的不同短期预测模型的表现,已经有大量文献,但关于城市层面病例的研究却很少。本文利用美国 25 个大都市统计区(MSA)的每日病例数,评估了各种统计预测模型在 7 天和 28 天预测方面的效果。
本研究采用梯度提升回归树(GBRT)、线性混合效应(LME)、易感、感染或康复(SIR)和季节性自回归综合移动平均(SARIMA)模型,从 2020 年 11 月至 2021 年 3 月,对 COVID-19 病例进行每日预测。
与其他采用基于机器学习(ML)的方法的研究一致,我们发现,对于大多数 MSA,GBRT 的 7 天和 28 天预测的中位数绝对百分比误差(MAPE)值均低于 SIR、线性混合效应(LME)和季节性自回归综合移动平均(SARIMA)的对应值,这是在 2020 年 11 月至 12 月和 2021 年 1 月期间。GBRT 和 SARIMA 模型对 2021 年 2 月的预测并不准确。然而,SARIMA 生成的 28 天预测的 MAPE 值略低于相应的 GBRT 估计值,适用于 2021 年 3 月。
本研究结果表明,基本的 ML 模型可以在地方层面实现相对准确的预测,这对于决策者的资源分配决策和流行病学监测非常重要。