Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
Sci Rep. 2022 Feb 14;12(1):2467. doi: 10.1038/s41598-022-06218-3.
This study aims to develop an assumption-free data-driven model to accurately forecast COVID-19 spread. Towards this end, we firstly employed Bayesian optimization to tune the Gaussian process regression (GPR) hyperparameters to develop an efficient GPR-based model for forecasting the recovered and confirmed COVID-19 cases in two highly impacted countries, India and Brazil. However, machine learning models do not consider the time dependency in the COVID-19 data series. Here, dynamic information has been taken into account to alleviate this limitation by introducing lagged measurements in constructing the investigated machine learning models. Additionally, we assessed the contribution of the incorporated features to the COVID-19 prediction using the Random Forest algorithm. Results reveal that significant improvement can be obtained using the proposed dynamic machine learning models. In addition, the results highlighted the superior performance of the dynamic GPR compared to the other models (i.e., Support vector regression, Boosted trees, Bagged trees, Decision tree, Random Forest, and XGBoost) by achieving an averaged mean absolute percentage error of around 0.1%. Finally, we provided the confidence level of the predicted results based on the dynamic GPR model and showed that the predictions are within the 95% confidence interval. This study presents a promising shallow and simple approach for predicting COVID-19 spread.
本研究旨在开发一种无假设的数据驱动模型,以准确预测 COVID-19 的传播。为此,我们首先采用贝叶斯优化来调整高斯过程回归(GPR)的超参数,以开发一种基于 GPR 的高效模型,用于预测印度和巴西这两个受影响最严重的国家的已康复和确诊 COVID-19 病例。然而,机器学习模型并未考虑 COVID-19 数据序列中的时间依赖性。在这里,通过在构建所研究的机器学习模型时引入滞后测量,引入了动态信息以缓解此限制。此外,我们使用随机森林算法评估了所纳入特征对 COVID-19 预测的贡献。结果表明,使用提出的动态机器学习模型可以获得显著的改进。此外,结果突出了动态 GPR 相对于其他模型(即支持向量回归、增强树、袋装树、决策树、随机森林和 XGBoost)的卓越性能,平均平均绝对百分比误差约为 0.1%。最后,我们根据动态 GPR 模型提供了预测结果的置信水平,并表明预测结果在 95%的置信区间内。本研究提出了一种有前途的浅层简单方法,用于预测 COVID-19 的传播。