Hwang Eunju
Department of Applied Statistics, Gachon University, South Korea.
Chaos Solitons Fractals. 2022 Feb;155:111789. doi: 10.1016/j.chaos.2021.111789. Epub 2022 Jan 3.
This paper is devoted to modeling and predicting COVID-19 confirmed cases through a multiple linear regression. Especially, prediction intervals of the COVID-19 cases are extensively studied. Due to long-memory feature of the COVID-19 data, a heterogeneous autoregression (HAR) is adopted with Growth rates and Vaccination rates; it is called HAR-G-V model. Top eight affected countries are taken with their daily confirmed cases and vaccination rates. Model criteria results such as root mean square error (RMSE), mean absolute error (MAE), , AIC and BIC are reported in the HAR models with/without the two rates. The HAR-G-V model performs better than other HAR models. Out-of-sample forecasting by the HAR-G-V model is conducted. Forecast accuracy measures such as RMSE, MAE, mean absolute percentage error and root relative square error are computed. Furthermore, three types of prediction intervals are constructed by approximating residuals to normal and Laplace distributions, as well as by employing bootstrap procedure. Empirical coverage probability, average length and mean interval score are evaluated for the three prediction intervals. This work contributes three folds: a novel trial to combine both growth rates and vaccination rates in modeling COVID-19; construction and comparison of three types of prediction intervals; and an attempt to improve coverage probability and mean interval score of prediction intervals via bootstrap technique.
本文致力于通过多元线性回归对新冠确诊病例进行建模和预测。特别是,对新冠病例的预测区间进行了广泛研究。鉴于新冠数据的长记忆特征,采用了包含增长率和疫苗接种率的异质性自回归(HAR)模型,即HAR-G-V模型。选取了受影响最严重的八个国家及其每日确诊病例数和疫苗接种率。报告了在有/无这两种比率情况下的HAR模型的诸如均方根误差(RMSE)、平均绝对误差(MAE)、 、AIC和BIC等模型标准结果。HAR-G-V模型的表现优于其他HAR模型。进行了HAR-G-V模型的样本外预测。计算了诸如RMSE、MAE、平均绝对百分比误差和根相对平方误差等预测准确性指标。此外,通过将残差近似为正态分布和拉普拉斯分布以及采用自助法构建了三种类型的预测区间。对这三种预测区间评估了经验覆盖概率、平均长度和平均区间得分。这项工作有三个方面的贡献:在对新冠进行建模时将增长率和疫苗接种率结合起来的新尝试;三种类型预测区间的构建与比较;以及通过自助技术提高预测区间的覆盖概率和平均区间得分的尝试。