Hwang Eunju, Yu SeongMin
Department of Applied Statistics, Gachon University, South Korea.
Results Phys. 2021 Oct;29:104631. doi: 10.1016/j.rinp.2021.104631. Epub 2021 Aug 21.
This paper deals with time series analysis for COVID-19 in South Korea. We adopt heterogeneous autoregressive (HAR) time series models and discuss the statistical inference for various COVID-19 data. Seven data sets such as cumulative confirmed (CC) case, cumulative recovered (CR) case and cumulative death (CD) case as well as recovery rate, fatality rate and infection rates for 14 and 21 days are handled for the statistical analysis. In the HAR models, model selections of orders are conducted by evaluating root mean square error (RMSE) and mean absolute error (MAE) as well as , AIC, and BIC. As a result of estimation, we provide coefficients estimates, standard errors and 95% confidence intervals in the HAR models. Our results report that fitted values via the HAR models are not only well-matched with the real cumulative cases but also differenced values from the fitted HAR models are well-matched with real daily cases. Additionally, because the CC and the CD cases are strongly correlated, we use a bivariate HAR model for the two data sets. Out-of-sample forecastings are carried out with the COVID-19 data sets to obtain multi-step ahead predicted values and 95% prediction intervals. As for the forecasting performances, four accuracy measures such as RMSE, MAE, mean absolute percentage error (MAPE) and root relative square error (RRSE) are evaluated. Contributions of this work are three folds: First, it is shown that the HAR models fit well to cumulative numbers of the COVID-19 data along with good criterion results. Second, a variety of analysis are studied for the COVID-19 series: confirmed, recovered, death cases, as well as the related rates. Third, forecast accuracy measures are evaluated as small values of errors, and thus it is concluded that the HAR model provides a good prediction model for the COVID-19.
本文探讨了韩国新冠肺炎疫情的时间序列分析。我们采用异质自回归(HAR)时间序列模型,并讨论了针对各种新冠肺炎数据的统计推断。为进行统计分析,我们处理了七个数据集,如累计确诊病例(CC)、累计康复病例(CR)、累计死亡病例(CD),以及14天和21天的康复率、死亡率和感染率。在HAR模型中,通过评估均方根误差(RMSE)、平均绝对误差(MAE)以及 、AIC和BIC来进行阶数的模型选择。作为估计结果,我们给出了HAR模型中的系数估计值、标准误差和95%置信区间。我们的结果表明,通过HAR模型得到的拟合值不仅与实际累计病例匹配良好,而且HAR模型拟合值的差值也与实际每日病例匹配良好。此外,由于CC病例和CD病例高度相关,我们对这两个数据集使用了双变量HAR模型。利用新冠肺炎数据集进行样本外预测,以获得多步超前预测值和95%预测区间。对于预测性能,评估了四种准确度指标,如RMSE、MAE、平均绝对百分比误差(MAPE)和根相对平方误差(RRSE)。这项工作的贡献有三个方面:第一,结果表明HAR模型能很好地拟合新冠肺炎数据的累计数量,且准则结果良好。第二,对新冠肺炎序列进行了多种分析:确诊病例、康复病例、死亡病例以及相关比率。第三,预测准确度指标的误差值较小,因此得出结论,HAR模型为新冠肺炎提供了一个良好的预测模型。