Woo Hyekyung, Sung Cho Hyeon, Shim Eunyoung, Lee Jong Koo, Lee Kihwang, Song Gilyoung, Cho Youngtae
Department of Public Health Science, School of Public Health, Seoul National University, Seoul, Korea.
Department of Intelligent Cognitive Technology Research, Electronics and Telecommunications Research Institute, Daejeon, Korea.
Disaster Med Public Health Prep. 2018 Jun;12(3):352-359. doi: 10.1017/dmp.2017.84. Epub 2017 Jul 31.
Social media data are a highly contextual health information source. The objective of this study was to identify Korean keywords for detecting influenza epidemics from social media data.
We included data from Twitter and online blog posts to obtain a sufficient number of candidate indicators and to represent a larger proportion of the Korean population. We performed the following steps: initial keyword selection; generation of a keyword time series using a preprocessing approach; optimal feature selection; model building and validation using least absolute shrinkage and selection operator, support vector machine (SVM), and random forest regression (RFR).
A total of 15 keywords optimally detected the influenza epidemic, evenly distributed across Twitter and blog data sources. Model estimates generated using our SVM model were highly correlated with recent influenza incidence data.
The basic principles underpinning our approach could be applied to other countries, languages, infectious diseases, and social media sources. Social media monitoring using our approach may support and extend the capacity of traditional surveillance systems for detecting emerging influenza. (Disaster Med Public Health Preparedness. 2018; 12: 352-359).
社交媒体数据是一种具有高度情境性的健康信息来源。本研究的目的是从社交媒体数据中识别出用于检测流感疫情的韩语关键词。
我们纳入了来自推特和在线博客文章的数据,以获得足够数量的候选指标,并代表更大比例的韩国人口。我们进行了以下步骤:初始关键词选择;使用预处理方法生成关键词时间序列;最优特征选择;使用最小绝对收缩和选择算子、支持向量机(SVM)和随机森林回归(RFR)进行模型构建和验证。
总共15个关键词能够最优地检测流感疫情,这些关键词在推特和博客数据源中均匀分布。使用我们的支持向量机模型生成的模型估计值与近期流感发病率数据高度相关。
我们方法的基本原则可应用于其他国家、语言、传染病和社交媒体来源。使用我们的方法进行社交媒体监测可能会支持并扩展传统监测系统检测新发流感的能力。(《灾害医学与公共卫生防范》。2018年;12: 352 - 359)