Samaras Loukas, Sicilia Miguel-Angel, García-Barriocanal Elena
Computer Science Department, Polytechnic Building, University of Alcalá, Ctra. De Barcelona km. 33.6, 28871, Alcalá de Henares (Madrid), Spain.
BMC Public Health. 2021 Jan 21;21(1):100. doi: 10.1186/s12889-020-10106-8.
In recent years new forms of syndromic surveillance that use data from the Internet have been proposed. These have been developed to assist the early prediction of epidemics in various cases and diseases. It has been found that these systems are accurate in monitoring and predicting outbreaks before these are observed in population and, therefore, they can be used as a complement to other methods. In this research, our aim is to examine a highly infectious disease, measles, as there is no extensive literature on forecasting measles using Internet data, METHODS: This research has been conducted with official data on measles for 5 years (2013-2018) from the competent authority of the European Union (European Center of Disease and Prevention - ECDC) and data obtained from Google Trends by using scripts coded in Python. We compared regression models forecasting the development of measles in the five countries.
Results show that measles can be estimated and predicted through Google Trends in terms of time, volume and the overall spread. The combined results reveal a strong relationship of measles cases with the predicted cases (correlation coefficient R= 0.779 in two-tailed significance p< 0.01). The mean standard error was relatively low 45.2 (12.19%) for the combined results. However, major differences and deviations were observed for countries with a relatively low impact of measles, such as the United Kingdom and Spain. For these countries, alternative models were tested in an attempt to improve the results.
The estimation of measles cases from Google Trends produces acceptable results and can help predict outbreaks in a robust and sound manner, at least 2 months in advance. Python scripts can be used individually or within the framework of an integrated Internet surveillance system for tracking epidemics as the one addressed here.
近年来,有人提出了利用互联网数据进行综合征监测的新形式。这些新形式旨在协助早期预测各种病例和疾病中的疫情。研究发现,这些系统在监测和预测疫情方面非常准确,能够在疫情在人群中出现之前就有所察觉,因此可以作为其他方法的补充。在本研究中,我们的目标是研究一种高度传染性疾病——麻疹,因为目前尚无关于利用互联网数据预测麻疹的广泛文献。
本研究使用了来自欧盟主管当局(欧洲疾病预防控制中心——ECDC)的5年(2013 - 2018年)麻疹官方数据,以及通过用Python编写的脚本从谷歌趋势获取的数据。我们比较了预测五个国家麻疹发展情况的回归模型。
结果表明,麻疹在时间、数量和总体传播方面可以通过谷歌趋势进行估计和预测。综合结果显示麻疹病例与预测病例之间存在很强的关系(双尾显著性p < 0.01时,相关系数R = 0.779)。综合结果的平均标准误差相对较低,为45.2(12.19%)。然而,对于麻疹影响相对较小的国家,如英国和西班牙,观察到了较大的差异和偏差。对于这些国家,我们测试了替代模型以试图改善结果。
通过谷歌趋势估计麻疹病例能产生可接受的结果,并能以稳健可靠的方式帮助预测疫情,至少提前2个月。Python脚本可以单独使用,也可以在一个综合的互联网监测系统框架内用于追踪疫情,就像这里所讨论的系统一样。