Seo Dong-Woo, Jo Min-Woo, Sohn Chang Hwan, Shin Soo-Yong, Lee JaeHo, Yu Maengsoo, Kim Won Young, Lim Kyoung Soo, Lee Sang-Il
Asan Medical Center, Department of Emergency Medicine, University of Ulsan, College of Medicine, Seoul, Republic Of Korea.
J Med Internet Res. 2014 Dec 16;16(12):e289. doi: 10.2196/jmir.3680.
Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea.
The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data.
Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient.
In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7.
Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.
互联网搜索查询已成为症状监测系统中的重要数据源。然而,韩国目前尚无使用互联网搜索查询数据的症状监测系统。
本研究的目的是检验我们的累积查询方法与国家流感监测数据之间的相关性。
我们的研究基于本地搜索引擎Daum(市场份额约为25%)以及韩国疾病控制与预防中心的流感样疾病(ILI)数据。对200名参与者进行了配额抽样调查以获取热门查询。我们将研究期分为两组:第一组(2009/10流行病学年度用于开发集1,2010/11用于验证集1)和第二组(2010/11用于开发集2,2011/12用于验证集2)。计算开发集中Daum数据与ILI数据之间的Pearson相关系数。我们选择相关系数为0.7或更高的组合查询,并按降序排列。然后,我们创建了一种累积查询方法n,它表示按相关系数降序排列的累积组合查询数量。
在验证集1中,应用了13种累积查询方法,其中8种的相关系数(最小值=0.916,最大值=0.943)高于最高的单个组合查询。此外,13种累积查询方法中有11种的r值≥0.7,但13种组合查询中有4种的r值≥0.7。在验证集2中,15种累积查询方法中有8种的相关系数(最小值=0.975,最大值=0.987)高于最高的单个组合查询。所有15种累积查询方法的r值均≥0.7,但15种组合查询中有6种的r值≥0.7。
在开发集和验证集中,累积查询方法与国家流感监测数据的相关性相对高于组合查询。