Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada.
JMIR Public Health Surveill. 2020 Apr 14;6(2):e18828. doi: 10.2196/18828.
The recent global outbreak of coronavirus disease (COVID-19) is affecting many countries worldwide. Iran is one of the top 10 most affected countries. Search engines provide useful data from populations, and these data might be useful to analyze epidemics. Utilizing data mining methods on electronic resources' data might provide a better insight into the COVID-19 outbreak to manage the health crisis in each country and worldwide.
This study aimed to predict the incidence of COVID-19 in Iran.
Data were obtained from the Google Trends website. Linear regression and long short-term memory (LSTM) models were used to estimate the number of positive COVID-19 cases. All models were evaluated using 10-fold cross-validation, and root mean square error (RMSE) was used as the performance metric.
The linear regression model predicted the incidence with an RMSE of 7.562 (SD 6.492). The most effective factors besides previous day incidence included the search frequency of handwashing, hand sanitizer, and antiseptic topics. The RMSE of the LSTM model was 27.187 (SD 20.705).
Data mining algorithms can be employed to predict trends of outbreaks. This prediction might support policymakers and health care managers to plan and allocate health care resources accordingly.
近期全球爆发的冠状病毒病(COVID-19)正在影响许多国家。伊朗是受影响最严重的国家之一。搜索引擎可以提供来自人群的有用数据,这些数据可能有助于分析疫情。利用数据挖掘方法挖掘电子资源的数据,可能有助于深入了解 COVID-19 疫情的爆发情况,以便在每个国家和全球范围内管理卫生危机。
本研究旨在预测伊朗 COVID-19 的发病率。
数据来自 Google Trends 网站。线性回归和长短时记忆(LSTM)模型用于估计 COVID-19 阳性病例数。所有模型均使用 10 折交叉验证进行评估,均方根误差(RMSE)用作性能指标。
线性回归模型的 RMSE 为 7.562(SD 6.492)。除了前一天的发病率外,预测发病率的最有效因素还包括对手部清洁、手部消毒剂和防腐剂主题的搜索频率。LSTM 模型的 RMSE 为 27.187(SD 20.705)。
数据挖掘算法可用于预测疫情趋势。这种预测可能有助于政策制定者和医疗保健管理人员相应地计划和分配医疗保健资源。