Department of Ophthalmology, Gloucestershire Hospitals NHS Foundation Trust, Cheltenham, United Kingdom.
Department of Chemistry, University of Turin, Turin, Italy.
J Med Internet Res. 2021 Aug 11;23(8):e28876. doi: 10.2196/28876.
Previous studies have suggested associations between trends of web searches and COVID-19 traditional metrics. It remains unclear whether models incorporating trends of digital searches lead to better predictions.
The aim of this study is to investigate the relationship between Google Trends searches of symptoms associated with COVID-19 and confirmed COVID-19 cases and deaths. We aim to develop predictive models to forecast the COVID-19 epidemic based on a combination of Google Trends searches of symptoms and conventional COVID-19 metrics.
An open-access web application was developed to evaluate Google Trends and traditional COVID-19 metrics via an interactive framework based on principal component analysis (PCA) and time series modeling. The application facilitates the analysis of symptom search behavior associated with COVID-19 disease in 188 countries. In this study, we selected the data of nine countries as case studies to represent all continents. PCA was used to perform data dimensionality reduction, and three different time series models (error, trend, seasonality; autoregressive integrated moving average; and feed-forward neural network autoregression) were used to predict COVID-19 metrics in the upcoming 14 days. The models were compared in terms of prediction ability using the root mean square error (RMSE) of the first principal component (PC1). The predictive abilities of models generated with both Google Trends data and conventional COVID-19 metrics were compared with those fitted with conventional COVID-19 metrics only.
The degree of correlation and the best time lag varied as a function of the selected country and topic searched; in general, the optimal time lag was within 15 days. Overall, predictions of PC1 based on both search terms and COVID-19 traditional metrics performed better than those not including Google searches (median 1.56, IQR 0.90-2.49 versus median 1.87, IQR 1.09-2.95, respectively), but the improvement in prediction varied as a function of the selected country and time frame. The best model varied as a function of country, time range, and period of time selected. Models based on a 7-day moving average led to considerably smaller RMSE values as opposed to those calculated with raw data (median 0.90, IQR 0.50-1.53 versus median 2.27, IQR 1.62-3.74, respectively).
The inclusion of digital online searches in statistical models may improve the nowcasting and forecasting of the COVID-19 epidemic and could be used as one of the surveillance systems of COVID-19 disease. We provide a free web application operating with nearly real-time data that anyone can use to make predictions of outbreaks, improve estimates of the dynamics of ongoing epidemics, and predict future or rebound waves.
先前的研究表明,网络搜索趋势与 COVID-19 的传统指标之间存在关联。目前尚不清楚纳入数字搜索趋势的模型是否能带来更好的预测效果。
本研究旨在探讨与 COVID-19 相关症状的谷歌趋势搜索与确诊 COVID-19 病例和死亡人数之间的关系。我们旨在开发预测模型,基于症状的谷歌趋势搜索和传统 COVID-19 指标的组合来预测 COVID-19 疫情。
开发了一个开放访问的网络应用程序,通过基于主成分分析(PCA)和时间序列建模的交互框架来评估谷歌趋势和传统 COVID-19 指标。该应用程序便于分析与 COVID-19 疾病相关的症状搜索行为,涵盖 188 个国家。在这项研究中,我们选择了 9 个国家的数据作为案例研究,以代表所有大陆。采用 PCA 进行数据降维,使用三种不同的时间序列模型(误差、趋势、季节性;自回归综合移动平均;前馈神经网络自回归)来预测未来 14 天的 COVID-19 指标。通过第一主成分(PC1)的均方根误差(RMSE)比较模型的预测能力。比较了仅使用传统 COVID-19 指标生成模型与同时使用谷歌趋势数据和传统 COVID-19 指标生成模型的预测能力。
相关性的程度和最佳时间滞后因所选国家和搜索主题而异;一般来说,最佳时间滞后在 15 天以内。总体而言,基于搜索词和 COVID-19 传统指标的 PC1 预测优于不包括谷歌搜索的预测(中位数 1.56,IQR 0.90-2.49 与中位数 1.87,IQR 1.09-2.95,分别),但预测的改善程度因所选国家和时间框架而异。最佳模型因国家、时间范围和所选时间段而异。与使用原始数据计算相比,基于 7 天移动平均值的模型导致的 RMSE 值明显更小(中位数 0.90,IQR 0.50-1.53 与中位数 2.27,IQR 1.62-3.74,分别)。
将数字在线搜索纳入统计模型中可能会提高 COVID-19 疫情的实时和预测能力,并可作为 COVID-19 疾病监测系统之一。我们提供了一个免费的网络应用程序,可实时运行数据,任何人都可以使用该程序对疫情爆发进行预测,改进对正在进行的疫情动态的估计,并预测未来或反弹波。