Universidade Federal do Rio Grande do Norte, Av. Salgado Filho, Campus Universitário, 59.078-970, Natal, 3000, Brazil.
Instituto Federal do Rio Grande do Norte, R. Raimundo Firmino de Oliveira, 400, 59.628-330, Mossoró, Brazil.
Environ Res. 2022 Mar;204(Pt D):112348. doi: 10.1016/j.envres.2021.112348. Epub 2021 Nov 9.
Since the start of the COVID-19 pandemic many studies investigated the correlation between climate variables such as air quality, humidity and temperature and the lethality of COVID-19 around the world. In this work we investigate the use of climate variables, as additional features to train a data-driven multivariate forecast model to predict the short-term expected number of COVID-19 deaths in Brazilian states and major cities. The main idea is that by adding these climate features as inputs to the training of data-driven models, the predictive performance improves when compared to equivalent single input models. We use a Stacked LSTM as the network architecture for both the multivariate and univariate model. We compare both approaches by training forecast models for the COVID-19 deaths time series of the city of São Paulo. In addition, we present a previous analysis based on grouping K-means on AQI curves. The results produced will allow achieving the application of transfer learning, once a locality is eventually added to the task, regressing out using a model based on the cluster of similarities in the AQI curve. The experiments show that the best multivariate model is more skilled than the best standard data-driven univariate model that we could find, using as evaluation metrics the average fitting error, average forecast error, and the profile of the accumulated deaths for the forecast. These results show that by adding more useful features as input to a multivariate approach could further improve the quality of the prediction models.
自 COVID-19 大流行开始以来,许多研究调查了气候变量(如空气质量、湿度和温度)与全球 COVID-19 致死率之间的相关性。在这项工作中,我们研究了使用气候变量作为附加特征来训练数据驱动的多元预测模型,以预测巴西各州和主要城市 COVID-19 死亡人数的短期预期。主要思想是,通过将这些气候特征作为输入添加到数据驱动模型的训练中,与等效的单输入模型相比,预测性能会提高。我们使用堆叠 LSTM 作为多元和单变量模型的网络架构。我们通过为圣保罗市 COVID-19 死亡时间序列训练预测模型来比较这两种方法。此外,我们还提出了基于 AQI 曲线聚类的 K-means 的先前分析。所产生的结果将允许实现迁移学习的应用,一旦最终向任务添加一个位置,就可以使用基于 AQI 曲线相似性聚类的模型进行回归。实验表明,最佳多元模型比我们找到的最佳标准数据驱动单变量模型更熟练,使用的评估指标是平均拟合误差、平均预测误差和预测的累积死亡人数分布。这些结果表明,通过向多元方法添加更多有用的特征作为输入,可以进一步提高预测模型的质量。