Division of Epidemiology & Biostatistics, Graduate School of Public Health, San Diego State University, San Diego, CA, USA.
Department of Epidemiology & Biostatistics, University of Arizona College of Public Health, Tucson, AZ, USA.
BMC Public Health. 2018 Apr 3;18(1):445. doi: 10.1186/s12889-018-5367-z.
Respiratory Syncytial Virus (RSV) is the leading cause of hospitalization in children less than 1 year of age in the United States. Internet search engine queries may provide high resolution temporal and spatial data to estimate and predict disease activity.
After filtering an initial list of 613 symptoms using high-resolution Bing search logs, we used Google Trends data between 2004 and 2016 for a smaller list of 50 terms to build predictive models of RSV incidence for five states where long-term surveillance data was available. We then used domain adaptation to model RSV incidence for the 45 remaining US states.
Surveillance data sources (hospitalization and laboratory reports) were highly correlated, as were laboratory reports with search engine data. The four terms which were most often statistically significantly correlated as time series with the surveillance data in the five state models were RSV, flu, pneumonia, and bronchiolitis. Using our models, we tracked the spread of RSV by observing the time of peak use of the search term in different states. In general, the RSV peak moved from south-east (Florida) to the north-west US.
Our study represents the first time that RSV has been tracked using Internet data results and highlights successful use of search filters and domain adaptation techniques, using data at multiple resolutions. Our approach may assist in identifying spread of both local and more widespread RSV transmission and may be applicable to other seasonal conditions where comprehensive epidemiological data is difficult to collect or obtain.
呼吸道合胞病毒(RSV)是美国 1 岁以下儿童住院的主要原因。互联网搜索引擎查询可能提供高分辨率的时间和空间数据,以估计和预测疾病活动。
在使用高分辨率 Bing 搜索日志过滤了最初的 613 个症状列表后,我们使用了 2004 年至 2016 年之间的 Google Trends 数据来对 50 个较小的术语列表进行建模,以建立五个有长期监测数据的州的 RSV 发病率预测模型。然后,我们使用领域自适应来对 45 个剩余的美国州的 RSV 发病率进行建模。
监测数据源(住院和实验室报告)高度相关,实验室报告与搜索引擎数据也高度相关。在五个州的模型中,作为时间序列与监测数据最常具有统计学显著相关性的四个术语是 RSV、流感、肺炎和细支气管炎。使用我们的模型,我们通过观察不同州搜索词使用高峰期的时间来跟踪 RSV 的传播。一般来说,RSV 高峰从东南部(佛罗里达州)转移到美国西北部。
我们的研究首次使用互联网数据结果跟踪 RSV,并强调了成功使用搜索过滤器和领域自适应技术,使用多种分辨率的数据。我们的方法可以帮助识别局部和更广泛的 RSV 传播的传播,并且可能适用于其他难以收集或获得全面流行病学数据的季节性条件。