Computer Science Department, Polytechnic Building, University of Alcalá, Ctra. De Barcelona km. 33.6, 28871, Alcalá de Henares, Madrid, Spain.
Sci Rep. 2020 Mar 16;10(1):4747. doi: 10.1038/s41598-020-61686-9.
Internet technologies have demonstrated their value for the early detection and prediction of epidemics. In diverse cases, electronic surveillance systems can be created by obtaining and analyzing on-line data, complementing other existing monitoring resources. This paper reports the feasibility of building such a system with search engine and social network data. Concretely, this study aims at gathering evidence on which kind of data source leads to better results. Data have been acquired from the Internet by means of a system which gathered real-time data for 23 weeks. Data on influenza in Greece have been collected from Google and Twitter and they have been compared to influenza data from the official authority of Europe. The data were analyzed by using two models: the ARIMA model computed estimations based on weekly sums and a customized approximate model which uses daily sums. Results indicate that influenza was successfully monitored during the test period. Google data show a high Pearson correlation and a relatively low Mean Absolute Percentage Error (R = 0.933, MAPE = 21.358). Twitter results are slightly better (R = 0.943, MAPE = 18.742). The alternative model is slightly worse than the ARIMA(X) (R = 0.863, MAPE = 22.614), but with a higher mean deviation (abs. mean dev: 5.99% vs 4.74%).
互联网技术已经证明了其在传染病的早期检测和预测方面的价值。在不同的情况下,可以通过获取和分析在线数据来创建电子监测系统,以补充其他现有的监测资源。本文报告了使用搜索引擎和社交网络数据构建此类系统的可行性。具体来说,本研究旨在收集关于哪种数据源能带来更好结果的证据。通过一个系统从互联网上获取数据,该系统实时收集了 23 周的数据。从谷歌和推特上收集了希腊的流感数据,并将其与欧洲官方机构的流感数据进行了比较。使用两种模型对数据进行了分析:ARIMA 模型根据每周总和进行估计计算,以及使用每日总和的自定义近似模型。结果表明,在测试期间成功监测到了流感。谷歌数据显示出较高的皮尔逊相关性和相对较低的平均绝对百分比误差(R=0.933,MAPE=21.358)。推特的结果略好(R=0.943,MAPE=18.742)。替代模型略逊于 ARIMA(X)(R=0.863,MAPE=22.614),但平均偏差更高(绝对值平均偏差:5.99%对 4.74%)。