Domnich Alexander, Panatto Donatella, Signori Alessio, Lai Piero Luigi, Gasparini Roberto, Amicizia Daniela
Department of Health Sciences, University of Genoa, Genoa, Italy.
Department of Health Sciences, University of Genoa, Genoa, Italy; Inter-University Centre of Research on Influenza and other Transmissible Infections (CIRI-IT), Genoa, Italy.
PLoS One. 2015 May 26;10(5):e0127754. doi: 10.1371/journal.pone.0127754. eCollection 2015.
Web queries are now widely used for modeling, nowcasting and forecasting influenza-like illness (ILI). However, given that ILI attack rates vary significantly across ages, in terms of both magnitude and timing, little is known about whether the association between ILI morbidity and ILI-related queries is comparable across different age-groups. The present study aimed to investigate features of the association between ILI morbidity and ILI-related query volume from the perspective of age.
Since Google Flu Trends is unavailable in Italy, Google Trends was used to identify entry terms that correlated highly with official ILI surveillance data. All-age and age-class-specific modeling was performed by means of linear models with generalized least-square estimation. Hold-out validation was used to quantify prediction accuracy. For purposes of comparison, predictions generated by exponential smoothing were computed.
Five search terms showed high correlation coefficients of > .6. In comparison with exponential smoothing, the all-age query-based model correctly predicted the peak time and yielded a higher correlation coefficient with observed ILI morbidity (.978 vs. .929). However, query-based prediction of ILI morbidity was associated with a greater error. Age-class-specific query-based models varied significantly in terms of prediction accuracy. In the 0-4 and 25-44-year age-groups, these did well and outperformed exponential smoothing predictions; in the 15-24 and ≥ 65-year age-classes, however, the query-based models were inaccurate and highly overestimated peak height. In all but one age-class, peak timing predicted by the query-based models coincided with observed timing.
The accuracy of web query-based models in predicting ILI morbidity rates could differ among ages. Greater age-specific detail may be useful in flu query-based studies in order to account for age-specific features of the epidemiology of ILI.
网络查询如今被广泛用于流感样疾病(ILI)的建模、实时预测和预报。然而,鉴于ILI发病率在不同年龄组之间在幅度和时间方面均存在显著差异,关于ILI发病率与ILI相关查询之间的关联在不同年龄组中是否具有可比性,目前所知甚少。本研究旨在从年龄角度调查ILI发病率与ILI相关查询量之间关联的特征。
由于谷歌流感趋势在意大利无法使用,因此使用谷歌趋势来识别与官方ILI监测数据高度相关的输入词。通过广义最小二乘估计的线性模型进行全年龄和特定年龄组的建模。采用留出法验证来量化预测准确性。为作比较,计算了指数平滑法生成的预测结果。
五个搜索词显示出大于0.6的高相关系数。与指数平滑法相比,基于全年龄查询的模型正确预测了峰值时间,并且与观察到的ILI发病率具有更高的相关系数(0.978对0.929)。然而,基于查询的ILI发病率预测存在较大误差。基于特定年龄组查询的模型在预测准确性方面差异显著。在0至4岁和25至44岁年龄组中,这些模型表现良好且优于指数平滑法预测;然而,在15至24岁和≥65岁年龄组中,基于查询的模型不准确且对峰值高度高估严重。在除一个年龄组之外的所有年龄组中,基于查询的模型预测的峰值时间与观察到的时间一致。
基于网络查询的模型在预测ILI发病率方面的准确性可能因年龄而异。在基于流感查询的研究中,更详细的年龄特异性信息可能有助于考虑ILI流行病学的年龄特异性特征。