Lampos Vasileios, Majumder Maimuna S, Yom-Tov Elad, Edelstein Michael, Moura Simon, Hamada Yohhei, Rangaka Molebogeng X, McKendry Rachel A, Cox Ingemar J
Department of Computer Science, University College London, London, UK.
Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
NPJ Digit Med. 2021 Feb 8;4(1):17. doi: 10.1038/s41746-021-00384-w.
Previous research has demonstrated that various properties of infectious diseases can be inferred from online search behaviour. In this work we use time series of online search query frequencies to gain insights about the prevalence of COVID-19 in multiple countries. We first develop unsupervised modelling techniques based on associated symptom categories identified by the United Kingdom's National Health Service and Public Health England. We then attempt to minimise an expected bias in these signals caused by public interest-as opposed to infections-using the proportion of news media coverage devoted to COVID-19 as a proxy indicator. Our analysis indicates that models based on online searches precede the reported confirmed cases and deaths by 16.7 (10.2-23.2) and 22.1 (17.4-26.9) days, respectively. We also investigate transfer learning techniques for mapping supervised models from countries where the spread of the disease has progressed extensively to countries that are in earlier phases of their respective epidemic curves. Furthermore, we compare time series of online search activity against confirmed COVID-19 cases or deaths jointly across multiple countries, uncovering interesting querying patterns, including the finding that rarer symptoms are better predictors than common ones. Finally, we show that web searches improve the short-term forecasting accuracy of autoregressive models for COVID-19 deaths. Our work provides evidence that online search data can be used to develop complementary public health surveillance methods to help inform the COVID-19 response in conjunction with more established approaches.
先前的研究表明,传染病的各种特性可以从在线搜索行为中推断出来。在这项工作中,我们使用在线搜索查询频率的时间序列来深入了解多个国家/地区新冠病毒病(COVID-19)的流行情况。我们首先基于英国国家医疗服务体系和英国公共卫生部确定的相关症状类别开发无监督建模技术。然后,我们试图通过将新闻媒体对COVID-19的报道比例作为代理指标,来最小化这些信号中由公众兴趣而非感染引起的预期偏差。我们的分析表明,基于在线搜索的模型分别比报告的确诊病例和死亡提前16.7(10.2 - 23.2)天和22.1(17.4 - 26.9)天。我们还研究了迁移学习技术,用于将监督模型从疾病传播已广泛进展的国家映射到处于各自疫情曲线早期阶段的国家。此外,我们比较了多个国家联合的在线搜索活动时间序列与确诊的COVID-19病例或死亡情况,发现了有趣的查询模式,包括发现较罕见的症状比常见症状是更好的预测指标。最后,我们表明网络搜索提高了COVID-19死亡自回归模型的短期预测准确性。我们的工作提供了证据,表明在线搜索数据可用于开发补充性的公共卫生监测方法,以结合更成熟的方法为COVID-19应对提供信息。