Multifactorial Diseases and Complex Phenotypes Research Area, Bambino Gesù Children's Hospital IRCCS, Rome, Italy.
PLoS One. 2013 Dec 4;8(12):e82489. doi: 10.1371/journal.pone.0082489. eCollection 2013.
Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.
推特有可能成为一种及时且具有成本效益的数据来源,可用于症状监测。当人们谈论某种疾病时,他们通常会使用自然的日常语言报告一系列症状,而不是疑似或最终诊断。我们开发了一种经过最少训练的算法,利用丰富的与健康相关的网页来识别与特定技术术语相关的所有行话表达。然后,我们将流感病例定义转换为布尔查询,每个症状都由技术术语和算法识别的所有相关行话表达来描述。随后,我们监测了所有报告符合病例定义查询的症状组合的推文。为了对消息进行地理位置定位,我们根据与每条推文相关联的代码定义了 3 种本地化策略。我们发现,我们的流感阳性推文趋势与美国传统监测系统识别的 ILI 趋势之间存在高度相关系数。