School of Electronics and Information Engineering, Soochow University, Suzhou, China.
Joint Shantou International Eye Center, Shantou University & the Chinese University of Hong Kong, Shantou, China.
J Med Internet Res. 2021 Jun 14;23(6):e24285. doi: 10.2196/24285.
Advanced prediction of the daily incidence of COVID-19 can aid policy making on the prevention of disease spread, which can profoundly affect people's livelihood. In previous studies, predictions were investigated for single or several countries and territories.
We aimed to develop models that can be applied for real-time prediction of COVID-19 activity in all individual countries and territories worldwide.
Data of the previous daily incidence and infoveillance data (search volume data via Google Trends) from 215 individual countries and territories were collected. A random forest regression algorithm was used to train models to predict the daily new confirmed cases 7 days ahead. Several methods were used to optimize the models, including clustering the countries and territories, selecting features according to the importance scores, performing multiple-step forecasting, and upgrading the models at regular intervals. The performance of the models was assessed using the mean absolute error (MAE), root mean square error (RMSE), Pearson correlation coefficient, and Spearman correlation coefficient.
Our models can accurately predict the daily new confirmed cases of COVID-19 in most countries and territories. Of the 215 countries and territories under study, 198 (92.1%) had MAEs <10 and 187 (87.0%) had Pearson correlation coefficients >0.8. For the 215 countries and territories, the mean MAE was 5.42 (range 0.26-15.32), the mean RMSE was 9.27 (range 1.81-24.40), the mean Pearson correlation coefficient was 0.89 (range 0.08-0.99), and the mean Spearman correlation coefficient was 0.84 (range 0.2-1.00).
By integrating previous incidence and Google Trends data, our machine learning algorithm was able to predict the incidence of COVID-19 in most individual countries and territories accurately 7 days ahead.
对 COVID-19 日发病率的高级预测有助于制定预防疾病传播的政策,这将深刻影响人们的生活。在之前的研究中,对单个或多个国家和地区的预测进行了调查。
我们旨在开发可用于实时预测全球所有单个国家和地区 COVID-19 活动的模型。
收集了来自 215 个国家和地区的过去每日发病率和信息监测数据(通过 Google Trends 的搜索量数据)。使用随机森林回归算法训练模型,以预测 7 天前的每日新确诊病例数。使用多种方法优化模型,包括对国家和地区进行聚类、根据重要性得分选择特征、进行多步预测以及定期升级模型。使用平均绝对误差 (MAE)、均方根误差 (RMSE)、皮尔逊相关系数和斯皮尔曼相关系数评估模型的性能。
我们的模型可以准确预测大多数国家和地区的 COVID-19 日新确诊病例数。在所研究的 215 个国家和地区中,198 个(92.1%)的 MAE<10,187 个(87.0%)的 Pearson 相关系数>0.8。对于 215 个国家和地区,平均 MAE 为 5.42(范围 0.26-15.32),平均 RMSE 为 9.27(范围 1.81-24.40),平均 Pearson 相关系数为 0.89(范围 0.08-0.99),平均斯皮尔曼相关系数为 0.84(范围 0.2-1.00)。
通过整合过去的发病率和 Google Trends 数据,我们的机器学习算法能够准确预测大多数单个国家和地区 7 天前的 COVID-19 发病率。