Jing Nan, Shi Zijing, Hu Yi, Yuan Ji
SHU-UTS SILC Business School, Shanghai University, Shanghai, 201800 China.
Onewo Space-Tech Service Co., Ltd., Shenzhen, 518049 China.
Appl Intell (Dordr). 2022;52(3):3303-3318. doi: 10.1007/s10489-021-02616-8. Epub 2021 Jul 5.
The coronavirus disease 2019 (COVID-19) is rapidly becoming one of the leading causes for mortality worldwide. Various models have been built in previous works to study the spread characteristics and trends of the COVID-19 pandemic. Nevertheless, due to the limited information and data source, the understanding of the spread and impact of the COVID-19 pandemic is still restricted. Therefore, within this paper not only daily historical time-series data of COVID-19 have been taken into account during the modeling, but also regional attributes, e.g., geographic and local factors, which may have played an important role on the confirmed COVID-19 cases in certain regions. In this regard, this study then conducts a comprehensive cross-sectional analysis and data-driven forecasting on this pandemic. The critical features, which has the significant influence on the infection rate of COVID-19, is determined by employing XGB (eXtreme Gradient Boosting) algorithm and SHAP (SHapley Additive exPlanation) and the comparison is carried out by utilizing the RF (Random Forest) and LGB (Light Gradient Boosting) models. To forecast the number of confirmed COVID-19 cases more accurately, a Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN) is applied in this paper. This model has better performance than SVR (Support Vector Regression) and the encoder-decoder network on the experimental dataset. And the model performance is evaluated in the light of three statistic metrics, i.e. MAE, RMSE and . Furthermore, this study is expected to serve as meaningful references for the control and prevention of the COVID-19 pandemic.
2019冠状病毒病(COVID-19)正迅速成为全球主要死因之一。先前的研究已经建立了各种模型来研究COVID-19大流行的传播特征和趋势。然而,由于信息和数据源有限,对COVID-19大流行的传播和影响的理解仍然受到限制。因此,在本文中,建模过程不仅考虑了COVID-19的每日历史时间序列数据,还考虑了区域属性,例如地理和当地因素,这些因素可能对某些地区的COVID-19确诊病例产生了重要影响。在此方面,本研究随后对这一大流行进行了全面的横断面分析和数据驱动的预测。通过采用XGB(极端梯度提升)算法和SHAP(SHapley加法解释)来确定对COVID-19感染率有重大影响的关键特征,并利用RF(随机森林)和LGB(轻梯度提升)模型进行比较。为了更准确地预测COVID-19确诊病例数,本文应用了基于双阶段注意力的循环神经网络(DA-RNN)。该模型在实验数据集上的性能优于支持向量回归(SVR)和编码器-解码器网络。并根据MAE、RMSE和[此处原文缺失一个指标]这三个统计指标对模型性能进行评估。此外,本研究有望为COVID-19大流行的防控提供有意义的参考。