Yom-Tov Elad, Borsa Diana, Cox Ingemar J, McKendry Rachel A
Microsoft Research Israel, Herzelia, Israel.
J Med Internet Res. 2014 Jun 18;16(6):e154. doi: 10.2196/jmir.3156.
Mass gatherings, such as music festivals and religious events, pose a health care challenge because of the risk of transmission of communicable diseases. This is exacerbated by the fact that participants disperse soon after the gathering, potentially spreading disease within their communities. The dispersion of participants also poses a challenge for traditional surveillance methods. The ubiquitous use of the Internet may enable the detection of disease outbreaks through analysis of data generated by users during events and shortly thereafter.
The intent of the study was to develop algorithms that can alert to possible outbreaks of communicable diseases from Internet data, specifically Twitter and search engine queries.
We extracted all Twitter postings and queries made to the Bing search engine by users who repeatedly mentioned one of nine major music festivals held in the United Kingdom and one religious event (the Hajj in Mecca) during 2012, for a period of 30 days and after each festival. We analyzed these data using three methods, two of which compared words associated with disease symptoms before and after the time of the festival, and one that compared the frequency of these words with those of other users in the United Kingdom in the days following the festivals.
The data comprised, on average, 7.5 million tweets made by 12,163 users, and 32,143 queries made by 1756 users from each festival. Our methods indicated the statistically significant appearance of a disease symptom in two of the nine festivals. For example, cough was detected at higher than expected levels following the Wakestock festival. Statistically significant agreement (chi-square test, P<.01) between methods and across data sources was found where a statistically significant symptom was detected. Anecdotal evidence suggests that symptoms detected are indeed indicative of a disease that some users attributed to being at the festival.
Our work shows the feasibility of creating a public health surveillance system for mass gatherings based on Internet data. The use of multiple data sources and analysis methods was found to be advantageous for rejecting false positives. Further studies are required in order to validate our findings with data from public health authorities.
诸如音乐节和宗教活动等大型集会,由于存在传染病传播风险,对医疗保健构成挑战。参与者在集会后很快分散,这可能会在其社区内传播疾病,使情况更加恶化。参与者的分散也给传统监测方法带来了挑战。互联网的广泛使用或许能够通过分析用户在活动期间及之后不久生成的数据来检测疾病爆发。
本研究旨在开发能够根据互联网数据(特别是推特和搜索引擎查询)对传染病可能爆发发出警报的算法。
我们提取了2012年期间反复提及英国举办的九个主要音乐节之一和一个宗教活动(麦加朝觐)的用户在30天内以及每个节日之后向必应搜索引擎发出的所有推特帖子和查询。我们使用三种方法分析这些数据,其中两种方法比较节日前后与疾病症状相关的词汇,另一种方法则在节日后的几天内将这些词汇的频率与英国其他用户的词汇频率进行比较。
数据平均包括12163名用户发布的750万条推文,以及每个节日1756名用户提出的32143条查询。我们的方法表明,在九个节日中的两个节日里,疾病症状出现具有统计学意义。例如,在“唤醒音乐节”之后检测到咳嗽的出现水平高于预期。在检测到具有统计学意义的症状的地方,发现方法之间以及不同数据源之间存在具有统计学意义的一致性(卡方检验,P<0.01)。轶事证据表明,检测到的症状确实表明一些用户认为是在节日期间染上的疾病。
我们的工作表明基于互联网数据为大型集会创建公共卫生监测系统的可行性。发现使用多个数据源和分析方法有利于排除误报。需要进一步开展研究,以便用公共卫生当局的数据验证我们的研究结果。