Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
Department of Statistics, Ferdowsi University of Mashhad, Mashhad, Iran.
BMC Public Health. 2024 Jan 10;24(1):148. doi: 10.1186/s12889-023-17627-y.
There are various forecasting algorithms available for univariate time series, ranging from simple to sophisticated and computational. In practice, selecting the most appropriate algorithm can be difficult, because there are too many algorithms. Although expert knowledge is required to make an informed decision, sometimes it is not feasible due to the lack of such resources as time, money, and manpower.
In this study, we used coronavirus disease 2019 (COVID-19) data, including the absolute numbers of confirmed, death and recovered cases per day in 187 countries from February 20, 2020, to May 25, 2021. Two popular forecasting models, including Auto-Regressive Integrated Moving Average (ARIMA) and exponential smoothing state-space model with Trigonometric seasonality, Box-Cox transformation, ARMA errors, Trend, and Seasonal components (TBATS) were used to forecast the data. Moreover, the data were evaluated by the root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and symmetric mean absolute percentage error (SMAPE) criteria to label time series. The various characteristics of each time series based on the univariate time series structure were extracted as meta-features. After that, three machine-learning classification algorithms, including support vector machine (SVM), decision tree (DT), random forest (RF), and artificial neural network (ANN) were used as meta-learners to recommend an appropriate forecasting model.
The finding of the study showed that the DT model had a better performance in the classification of time series. The accuracy of DT in the training and testing phases was 87.50% and 82.50%, respectively. The sensitivity of the DT algorithm in the training phase was 86.58% and its specificity was 88.46%. Moreover, the sensitivity and specificity of the DT algorithm in the testing phase were 73.33% and 88%, respectively.
In general, the meta-learning approach was able to predict the appropriate forecasting model (ARIMA and TBATS) based on some time series features. Considering some characteristics of the desired COVID-19 time series, the ARIMA or TBATS forecasting model might be recommended to forecast the death, confirmed, and recovered trend cases of COVID-19 by the DT model.
单变量时间序列有各种预测算法,从简单到复杂和计算密集型都有。在实践中,选择最合适的算法可能很困难,因为算法太多了。虽然需要专家知识来做出明智的决策,但有时由于缺乏时间、金钱和人力等资源,这并不可行。
在这项研究中,我们使用了 2020 年 2 月 20 日至 2021 年 5 月 25 日来自 187 个国家的每日确诊、死亡和康复病例的绝对数量的新型冠状病毒疾病 2019(COVID-19)数据。我们使用了两种流行的预测模型,包括自回归综合移动平均(ARIMA)和具有三角函数季节性、Box-Cox 变换、ARMA 误差、趋势和季节性成分的指数平滑状态空间模型(TBATS)来预测数据。此外,我们使用均方根误差(RMSE)、平均绝对误差(MAE)、平均绝对百分比误差(MAPE)和对称平均绝对百分比误差(SMAPE)标准来评估数据,以标记时间序列。根据单变量时间序列结构提取了每个时间序列的各种特征作为元特征。之后,我们使用支持向量机(SVM)、决策树(DT)、随机森林(RF)和人工神经网络(ANN)这三种机器学习分类算法作为元学习者来推荐合适的预测模型。
研究结果表明,DT 模型在时间序列分类方面表现更好。DT 在训练和测试阶段的准确率分别为 87.50%和 82.50%。DT 算法在训练阶段的灵敏度为 86.58%,特异性为 88.46%。此外,DT 算法在测试阶段的灵敏度和特异性分别为 73.33%和 88%。
总的来说,元学习方法能够根据一些时间序列特征预测合适的预测模型(ARIMA 和 TBATS)。考虑到所需 COVID-19 时间序列的一些特征,DT 模型可能会推荐使用 ARIMA 或 TBATS 预测模型来预测 COVID-19 的死亡、确诊和康复病例趋势。