Suppr超能文献

从推特和网络博客文章中识别关键词以检测韩国的流感疫情

Identification of Keywords From Twitter and Web Blog Posts to Detect Influenza Epidemics in Korea.

作者信息

Woo Hyekyung, Sung Cho Hyeon, Shim Eunyoung, Lee Jong Koo, Lee Kihwang, Song Gilyoung, Cho Youngtae

机构信息

Department of Public Health Science, School of Public Health, Seoul National University, Seoul, Korea.

Department of Intelligent Cognitive Technology Research, Electronics and Telecommunications Research Institute, Daejeon, Korea.

出版信息

Disaster Med Public Health Prep. 2018 Jun;12(3):352-359. doi: 10.1017/dmp.2017.84. Epub 2017 Jul 31.

Abstract

OBJECTIVE

Social media data are a highly contextual health information source. The objective of this study was to identify Korean keywords for detecting influenza epidemics from social media data.

METHODS

We included data from Twitter and online blog posts to obtain a sufficient number of candidate indicators and to represent a larger proportion of the Korean population. We performed the following steps: initial keyword selection; generation of a keyword time series using a preprocessing approach; optimal feature selection; model building and validation using least absolute shrinkage and selection operator, support vector machine (SVM), and random forest regression (RFR).

RESULTS

A total of 15 keywords optimally detected the influenza epidemic, evenly distributed across Twitter and blog data sources. Model estimates generated using our SVM model were highly correlated with recent influenza incidence data.

CONCLUSIONS

The basic principles underpinning our approach could be applied to other countries, languages, infectious diseases, and social media sources. Social media monitoring using our approach may support and extend the capacity of traditional surveillance systems for detecting emerging influenza. (Disaster Med Public Health Preparedness. 2018; 12: 352-359).

摘要

目的

社交媒体数据是一种具有高度情境性的健康信息来源。本研究的目的是从社交媒体数据中识别出用于检测流感疫情的韩语关键词。

方法

我们纳入了来自推特和在线博客文章的数据,以获得足够数量的候选指标,并代表更大比例的韩国人口。我们进行了以下步骤:初始关键词选择;使用预处理方法生成关键词时间序列;最优特征选择;使用最小绝对收缩和选择算子、支持向量机(SVM)和随机森林回归(RFR)进行模型构建和验证。

结果

总共15个关键词能够最优地检测流感疫情,这些关键词在推特和博客数据源中均匀分布。使用我们的支持向量机模型生成的模型估计值与近期流感发病率数据高度相关。

结论

我们方法的基本原则可应用于其他国家、语言、传染病和社交媒体来源。使用我们的方法进行社交媒体监测可能会支持并扩展传统监测系统检测新发流感的能力。(《灾害医学与公共卫生防范》。2018年;12: 352 - 359)

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验